paperKB
coga / coga-kb
Help
Sign in

Chunk #10 — Results — Overview of data generation, alignment and variant discovery — Calibration, local realignment and assembly

Source
A map of human genome variation from population-scale sequencing.
Embedded
yes

Text

The quality of variant calls is influenced by many factors including the quantification of base calling error rates in sequence reads, the accuracy of local read alignment and the method by which individual genotypes are defined. The project introduced key innovations in each of these areas (see Supplementary Information). First, base quality scores reported by the image processing software were empirically recalibrated by tallying the proportion that mismatched the reference sequence (at non-dbSNP sites) as a function of the reported quality score, position in read and other characteristics. Second, at potential variant sites local realignment of all reads was performed jointly across all samples, allowing for alternative alleles that contained indels. This realignment step substantially reduced errors, because local misalignment, particularly around indels, can be a major source of error in variant calling. Finally, by initially analysing the data with multiple genotype and variant calling algorithms and then generating a consensus of these results, the project reduced genotyping error rates by 40-50% compared to those currently achievable using any one of the methods alone (Supplementary Figure 1).