Chunk #1 — A PUBLIC AUTISM DATASET

Source: A genome-wide linkage and association scan reveals novel loci for autism.
Embedded: yes

Text

Before merging, we carefully filtered each data set separately to ensure the highest possible genotype quality for analysis, since technical genotyping artifacts can create false positive findings. We therefore examined the distribution of χ2 values for the highest quality data, and used a series of quality control (QC) filters designed to identify a robust set of SNPs, including data completeness for each SNP, Mendelian errors per SNP and per family, and a careful evaluation of inflation of association statistics as a function of allele frequency and missing data (see Methods). As 324 individuals were genotyped at both centers, we performed a concordance check to validate our approach. After excluding one sample mix-up, we obtained an overall genotype concordance between the two centers of 99.7% for samples typed on 500K at JHU and 5.0 at Broad and 99.9% for samples run on 5.0 arrays at both sites. The combined dataset, consisting of 1,031 nuclear families (856 with two parents) and a total of 1,553 affected offspring, was employed for genetic analyses (Supplementary Table 1). These data were publicly released in October, 2007 and are directly available from AGRE and NIMH.