Chunk #12 — MATERIALS AND METHODS — Target dataset

Source: Evaluating risk for alcohol use disorder: Polygenic risk scores and family history.
Embedded: yes

Text

Genotyping, data processing and quality control information of COGA samples were reported previously (Lai et al., 2019, Lai et al., 2020). Briefly, COGA European ancestry samples were genotyped on different arrays: Illumina Human1M array and OmniExpress 12v1 array (Illumina, San Diego, CA), and the SmokeScreen array (Biorealm LLC, Walnut, CA). To assess the reported family structures, we used a set of 47,000 independent variants (defined as linkage disequilibrium (LD) r2 < 0.5) that were genotyped in all arrays with high genotyping quality (missing rate < 2%, minor allele frequency (MAF) >10%, Hardy-Weinberg Equilibrium (HWE) P-value >0.001), and family structures were updated if necessary. We also used these 47,000 variants to calculate principal components (PC) of population stratification using Eigenstrat (Price et al., 2006). Based on the first two PCs, those clustered with the European samples from the 1000 Genomes Projects were considered as having European ancestry. Before imputation, variants with A/T or C/G alleles, missing rate >5%, MAF <3%, and HWE P-value < 0.0001 were excluded. SHAPEIT2 (Delaneau et al., 2013) was used to phase the haplotypes and Minimac3 (Das