Chunk #7 — 2. MATERIAL AND METHODS — 2.4. Imputation

Source: Genome-wide survival analysis of age at onset of alcohol dependence in extended high-risk COGA families.
Embedded: yes

Text

We used BEAGLE version 3.3.1 (Browning and Browning, 2007) to impute SNPs that were not genotyped on the Illumina Omni Express array. Since our sample was European American, we used as a reference set the genotypic data from the EUR in the August 2010 release of the 1000 Genomes Project, provided with the Beagle 3.3.1 release. To account for uncertainty, we used the mean of the distribution of imputed genotypes, which corresponds to an expected allelic or genotypic count (dosage) for each individual. SNPs with a correlation between the best-guess genotype and allele dosage greater than 0.3 (r2>0.3), were used in the analyses. For individual-level genotype data, we retained genotypes having a probability ≥80% (from the gprob metric in Beagle); all other genotypes were set to missing. We converted genotypic probability data into most-likely genotypes. This allowed us to detect genotypic errors in families. The same rigorous quality control process used for genotyped SNPs was also applied to imputed SNPs. A total of 4,058,415 SNPs (MAF > 5%) that passed quality control and Mendelian inheritance checks were used for association analysis.