Chunk #11 — MATERIALS AND METHODS — Genotype data processing and imputation

Source: Associations between alcohol use disorder polygenic score and remission in participants from high-risk families and the Indiana Biobank.
Embedded: yes

Text

COGA EA samples were genotyped on three different arrays: Illumina Human1M and OmniExpress 12v1 arrays (Illumina, San Diego, CA), and the SmokeScreen array (Biorealm LLC, Walnut, CA). COGA AA samples were genotyped using Illumina Human2.5M array (Illumina San Diego, CA). Genotyping, data processing and quality control information of COGA samples were reported previously (Lai et al., 2019a, Lai et al., 2019b). Briefly, a set of 47,000 independent variants (defined as linkage disequilibrium (LD) r2 <0.5) that were genotyped in all arrays with high genotyping quality (missing rate <2%, minor allele frequency (MAF) >10%, Hardy-Weinberg Equilibrium (HWE) P-value >0.001) was used to confirm and update the reported family structures. This set of variants was also used to calculate the principal components (PC) of population stratification using Eigenstrat (Price et al., 2006). Based on the first two PCs, samples that clustered with the European and African samples from the 1000 Genomes Project (Phase 3, version 5, NCBI GRCh37) were considered as EA and AA samples, respectively. Before imputation, variants with palindromic alleles, missing rate >5%, MAF <3%, and HWE P-value < 0.0001