paperKB
coga / coga-kb
Help
Sign in

Chunk #6 — Methods — Genotyping and quality control

Source
Genetic risk for major depressive disorder and loneliness in sex-specific associations with coronary artery disease.
Embedded
yes

Text

A subset of BioVU patients (n = 24,262) was genotyped as part of various institutional and investigator-initiated projects on the Illumina MEGAEX platform, which contains more than two million markers. Quality control proceeded as previously described [19]. Genotypes were imputed using SHAPEIT [20]/IMPUTE4 [21] with the 1000 genomes phase I reference panel, and variants with INFO < 0.3 were excluded. A subset of SNPs in linkage disequilibrium was used to calculate relatedness and principal components of ancestry using multidimensional scaling in PLINK v1.9 [22]. We randomly removed one individual from pairs of highly related individuals (pihat > 0.1) to avoid spurious results driven by cryptic relatedness, and restricted to a homogenous population of European descent defined by principal components of ancestry to avoid population stratification effects, leaving 18,385 individuals for analyses. Samples were genotyped in five batches, and variants were removed if allele frequencies differed significantly (P < 5 × 10−5) between any batch and the rest of the sample. Finally, we filtered multiallelic and structural variants, converted dosage data to hard genotype calls, and excluded variants with certainty <0.9 or INFO < 0.95, resulting in 5,218,407 high quality SNPs across the autosomes.