paperKB
coga / coga-kb
Help
Sign in

Chunk #54 — Methods — Manual annotation

Source
GENCODE: the reference human genome annotation for The ENCODE Project.
Embedded
yes

Text

Wootton 2001) was used to resolve any alignment with the genomic sequence that was unclear or absent from Blixem. Short alignments (less than 15 bases) that cannot be visualized using Dotter were detected using Zmap DNA Search (essentially a pattern matching tool; http://www.sanger.ac.uk/resources/software/zmap/). The construction of exon–intron boundaries required the presence of canonical splice sites, and any deviations from this rule were given clear explanatory tags. All nonredundant splicing transcripts at an individual locus were used to build transcript models, and all splice variants were assigned an individual biotype based on their putative functional potential. Once the correct transcript structure had been ascertained, the protein-coding potential of the transcript was determined on the basis of similarity to known protein sequences, the sequences of orthologous and paralogous proteins, the presence of Pfam functional domains (Finn et al. 2010), possible alternative ORFs, the presence of retained intronic sequence, and the likely susceptibility of the transcript to NMD (Lewis et al. 2003).