paperKB
coga / coga-kb
Help
Sign in

Chunk #44 — Methods — Genotype quality control and imputation

Source
Genome-wide association analyses identify 95 risk loci and provide insights into the neurobiology of post-traumatic stress disorder.
Embedded
yes

Text

After quality control, datasets were lifted over to the GRCh37/hg19 human genome reference build. SNP name inconsistencies were corrected, and genotypes were aligned to the strand of the imputation reference panel. Markers with non-matching allele codes or with excessive MAF difference (> 0.15) with the selected corresponding population in the reference data were removed. The pipeline was modified so that only the largest homogenous ancestry group in the data was used for the calculation of MAF. For ambiguous markers, strand was matched by comparing allele frequencies: if a strand flip resulted in a lower MAF difference between the study and the reference data, the strand was flipped. Ambiguous markers with high MAF (> 0.4) were removed. The genome was broken into 132 approximately equally sized chunks. For each chunk, genotypes were phased using Eagle v2.3.5 and phased genotypes were imputed into the Haplotype Reference Consortium panel86 using minimac3. Imputed datasets were deposited with the PGC DAC and are available for approved requests.