Chunk #33 — METHODS — PRIMARY STUDY SAMPLES — Merged dataset for primary screening

Source: A genome-wide linkage and association scan reveals novel loci for autism.
Embedded: yes

Text

Before merging data, we examined the distribution of chi-square values and used a series of quality control (QC) filters designed to identify a robust set of SNPs. We discovered that filtering AGRE genotypes to 98% completeness and less than 10 MEs was sufficient to remove SNPs that artificially inflated the chi-square distribution for SNPs with MAF (minor allele frequency) > 0.05. For MAF < 0.05, we observed much greater inflation (λ = 1.17), due entirely to a strong excess of SNPs with under-transmission of the minor allele (OR < 1). While the same filters yielded high-quality results for SNPs with over-transmission of the minor allele (λ = 1.04), we found that much stricter filtering was required for rarer SNPs with OR<1 (missing data < .005). This is not unexpected based on a well-documented bias in the TDT: if missing data are preferentially biased against heterozygotes or rare homozygotes, significant, artificial over-transmission of the common allele is expected 28,29. To achieve comparable quality for the NIMH dataset, we filtered on 96% completeness and fewer than 4 MEs. Our final QQ plot