paperKB
coga / coga-kb
Help
Sign in

Chunk #7 — Methods — Measures — Genotyping, quality control and ancestry estimation

Source
Polygenic risk scores for alcohol involvement relate to brain structure in substance-naïve children: Results from the ABCD study.
Embedded
yes

Text

The following preprocessing steps were conducted with the Ricopili pipeline (18): Single Nucleotide Polymorphisms (SNPs) with call rates ≥ 0.95 and MAF ≥ 1% were retained. Individuals with high rates of missingness (>5%) and autosomal heterozygosity deviation (FHET) outside of ± 2 SD were removed. After sample QC, SNPs were further filtered to call rate ≥ 0.98 and Hardy-Weinberg p-values > 1E-6 (founders only), which yielded 372,342 SNPs. In order to reconcile mismatches, sex checks were conducted with follow-up. Individuals whose data passed the first phase of QC were then checked for relatedness--both known and cryptic--and Mendelian errors were resolved. Next, using data from unrelated individuals (pi-hat ≤ 0.20) and an LD pruned set of common (MAF>0.05) and non-palindromic SNPs (and excluding MHC and chromosome 8 inversion region), principal components analysis (PCA) was performed in EIGENSTRAT using the European and African 1000 Genomes Project phase 3 data. yielding a sample of 4,737 of European Americans and 1232 African individuals. Due to the sensitivity of the PRS approach to admixture, we took a conservative approach and performed stringent exclusion for ancestral