Chunk #43 — Online Methods — Genotype data preprocessing

Source: Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression.
Embedded: yes

Text

The primary pre-processing and quality control of genotype data was conducted by each cohort, as specified in the original publications and in the Supplementary Note. The majority of cohorts used genotypes imputed to the 1000 Genomes phase 1 version 3 (1000G p1v3) or a newer reference panel. GenotypeHarmonizer55 v1.4.9 (https://github.com/molgenis/systemsgenetics/wiki/Genotype-Harmonizer) was used to harmonize all genotype datasets to match the GIANT 1000G p1v3 ALL reference panel (ftp://share.sph.umich.edu/1000genomes/fullProject/2012.03.14/GIANT.phase1_release_v3.20101123.snps_indels_svs.genotypes.refpanel.ALL.vcf.gz.tgz) and to fix potential strand issues for A/T and C/G SNPs. Each cohort tested SNPs with minor allele frequency (MAF) >0.01, Hardy-Weinberg P-value >0.0001, call rate >0.95, and MACH r2>0.5. Reported SNP identifiers are in dbSNP v137.