paperKB
coga / coga-kb
Help
Sign in

Chunk #9 — Materials and Methods — HapMap 3 cross-validation experiments

Source
Genotype imputation with thousands of genomes.
Embedded
yes

Text

Once every individual in a panel had been masked and imputed, we assessed accuracy at each SNP as the squared Pearson correlation (R2) between the masked genotypes, which take values in {0,1,2}, and the imputed allele dosages (also known as posterior mean genotypes), which take values in [0,2]. The allele dosage is defined for each genotype G as ∑x=02Pr(G=x)∗x, where Pr (G = x) is a marginal posterior probability generated by an imputation method. Once the correlation R2 had been measured for every masked SNP, we calculated the mean R2 across SNPs and reported this as a scalar summary of imputation accuracy in that cross-validation experiment. In rare situations, the correlation at a SNP was undefined because the imputation produced identical allele dosages for all individuals. In these cases, we set R2 = 0 to capture the intuition that there would be no power to detect an effect at such SNPs. We note that the HapMap 3 samples were phased together in continental groups, which implies that the absolute accuracies in this experiment may be slightly optimistic. However, our main