Chunk #53 — Methods — Manual annotation

Source: GENCODE: the reference human genome annotation for The ENCODE Project.
Embedded: yes

Text

Manual annotation of protein-coding genes, lncRNA genes, and pseudogenes was performed according to the guidelines of the HAVANA, available at ftp://ftp.sanger.ac.uk/pub/annotation. In summary, the HAVANA group produces annotation largely based on the alignment of transcriptomic (ESTs and mRNAs) and proteomic data from GenBank and Uniprot. These data were aligned to the individual BAC clones that make up the reference genome sequence using BLAST (Altschul et al. 1997) with a subsequent realignment of transcript data by Est2Genome (Mott 1997). Transcript and protein data, along with other data useful in their interpretation, were viewed in the Zmap annotation interface. Gene models were manually extrapolated from the alignments by annotators using the otterlace annotation interface (Searle et al. 2004). Alignments were navigated using the Blixem alignment viewer (Sonnhammer and Wootton 2001). Visual inspection of the dot-plot output from the Dotter tool (Sonnhammer and Wootton 2001) was used to resolve any alignment with the genomic sequence that was unclear or absent from Blixem. Short alignments (less than 15 bases) that cannot be visualized using Dotter were detected using Zmap DNA Search (essentially a pattern