Chunk #8 — GENOME‐WIDE ASSOCIATION STUDIES (GWAS) — Available genetic data, quality control, and imputation

Source: The collaborative study on the genetics of alcoholism: Genetics.
Embedded: yes

Text

Over time, COGA samples were genotyped on four different arrays in five different batches, summarized in Table 1, that are now combined into a single dataset for analyses. 61 Duplicates were removed and pedigree structures confirmed by combining all 12,145 samples, † selecting a set of 47,000 common (minor allele frequency (MAF) > 10%), independent (R 2 < 0.5), high quality (missing rate <2% and Hardy–Weinberg Equilibrium (HWE) p‐values > 0.001) variants that had been genotyped on all arrays, and calculating identity‐by‐descent by using PLINK. 62 , 63 Mendelian error checking with the revised pedigree structures was performed using Pedcheck 64 and inconsistencies were set to missing. These 47,000 variants were also used to calculate principal components (PCs) using Eigenstrat 65 and data from the 1000 Genomes (Phase 3, version 5). 66 Based on the first two PCs, each individual was first assigned as African ancestry (AA), European ancestry (EA), or Other; then, capitalizing on the family‐based recruitment strategy, a family‐based ancestral population was assigned according to the majority of individual‐based assignments in that family to facilitate downstream family‐based analyses.