Chunk #28 — RESULTS AND DISCUSSION — Efficiency of annotation on diverse genomes

Source: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data.
Embedded: yes

Text

Next, we tested ANNOVAR on ∼9 million genetic variants identified in HapMap subjects from the 1000 Genomes Project, and discovered ∼53 000, ∼78 000 and ∼63 000 exonic variants in CEU, YRI and JPT+CHB populations, respectively (Table 2). Compared to 1000 Genomes Project data, analysis of dbSNP data suggested that 1.4% of the variants disrupt exonic regions of genome, indicating a potential ascertainment bias in dbSNP toward functional SNPs (possibly due to the presence of many exon sequencing studies). Furthermore, we tested ANNOVAR on ∼15 million SNPs in the mouse genome (that is, variants that differ between mouse strains). We identified 157 745 exonic variants (∼1.1%), with slightly higher frequency than those observed in the 1000 Genomes Project. On average, it takes <1 min for every 1 million SNPs, so it is feasible to perform gene-based annotation on many hundreds of genomes in a day using a single personal computer.