paperKB
coga / coga-kb
Help
Sign in

Chunk #36 — Results — Genotyping Completeness and Accuracy

Source
Quality control and quality assurance in genotypic data for genome-wide association studies.
Embedded
yes

Text

Duplicate discordance estimates for individual SNPs also can be used as a SNP quality filter. The problem here is to find a level of discordance that would eliminate a large fraction of SNPs with high error rates, while retaining a large fraction with low error rates. For example, if the mean error rate is 10−4, we may wish to retain greater than 99% of SNPs with error rates less than 10−3, while eliminating as many as possible of SNPs with error rates greater than 10−2. For the Addiction project, with 60 duplicates, a threshold of >1 discordant call seems appropriate, since it would eliminate 99.9% of SNPs with an error rate of 10−1, 33.5% with a rate of 10−2, 0.65% with a rate of 10−3 and <0.1% with an error rate of 10−4. Figure S10 shows the relationship between the probability of observing greater than 0, 1, 2, or 3 discordant calls and the number of duplicates for different genotyping error rates. These binomial calculations can be used to select the optimum threshold and number of duplicates to achieve various levels of distinction among different error rates. At least 30 pairs are indicated for most situations.