It is particularly difficult to automate the selection of a single representative for a gene, and all large-scale genomic analyses and databases such as Ensembl (16) and SwissProt (17) get round this problem by simply selecting the longest isoform as the main variant. Although this is a safe choice, and is often correct, we have shown that it is not always the best strategy—only ∼75% of the isoforms selected by this strategy are likely to be principal (18).