To avoid preferential removal of rare genotypes or alleles at each marker, we recommend using the per marker quality scores to select a subset of imputed SNPs for analysis, instead of the per genotype quality scores. Overall, we saw a correlation of 0.77 between the estimated and actual accuracy of imputed genotypes for each marker. We also saw a correlation of 0.84 between the r2 estimated by our method and the actual r2 that resulted from comparing experimentally derived allele counts with their imputed estimates. Figure 1 shows the ROC curve [Pepe, 2003] for the two quality measures, showing that the estimated r2 measure is a more effective way to identify poorly imputed markers. In the FUSION GWAS scan [Scott et al., 2007], we used an r2 threshold of 0.30 to decide which markers were well imputed and should be included in further analyses, and which were not. At this threshold, we expect to remove 70% of poorly imputed markers (those where r2 with experimental genotypes is <20%) but only 0.50% of better imputed markers (those where r2 with experimental genotypes is >50%).