paperKB
coga / coga-kb
Processing
Help
Sign in

Chunk #13 — MATERIALS AND METHODS — Genotyping, Ancestry and Imputation:

Source
Genome-wide admixture mapping of DSM-IV alcohol dependence, criterion count, and the self-rating of the effects of ethanol in African American populations.
Embedded
yes

Text

Detailed information about data processing and QC applied in each study was reported previously (Lai, Wetherill, Bertelsen, et al., 2019; Lai, Wetherill, Kapoor, et al., 2019). To identify duplicate samples among studies, confirm the reported pedigree structure, and calculate PCs representing population stratification, all available data from COGA, SAGE, Yale-Penn, and NIAAA were combined. Then, variants meeting the following criteria: common (defined as minor allele frequency (MAF) >10% in the combined sample), independent (defined as r2 <0.5), high quality (missing rate <2% and Hardy-Weinberg Equilibrium (HWE) P-values >0.001), were used to identify duplicate samples and confirm the reported pedigree structure using PLINK (Chang et al., 2015; Purcell et al., 2007). To remove the same individual included in multiple datasets, (i.e., between COGA and SAGE), duplicate samples were removed from the study with less phenotypic information and fewer family members (e.g. SAGE). Family structures were updated, as needed. Using genetically-confirmed pedigrees, Pedcheck (O’Connell & Weeks, 1998) was used to identify Mendelian errors and inconsistencies were removed. The same set of variants were used to estimate PCs using Eigenstrat (Price et al.,