Chunk #53 — Methods — Imputation

Source: International meta-analysis of PTSD genome-wide association studies identifies sex- and ancestry-specific genetic risk loci.
Embedded: yes

Text

Imputation was based on the 1000 Genomes phase 3 data (1KGP phase 371). Any dataset using a human genome assembly version prior to GRCh37 (hg19) was lifted over to GRCh37 (hg19). SNP alignment proceeded as follows: for each dataset, SNPs were aligned to the same strand as the 1KGP phase 3 data. For ambiguous markers, the largest ancestry group was used to calculate allele frequencies and only SNPs with MAF <40% and ≤15% difference between matching 1KGP phase 3 ancestry data were retained. Pre-phasing was performed using default settings in SHAPEIT2 v2.r83772 without reference subjects, and phasing was done in 3 megabase (MB) blocks, where an additional 1 MB of buffer was added to either end of the block. Haplotypes were then imputed using default settings in IMPUTE2 v2.2.273, with 1KGP phase 3 reference data and genetic map, a 1 MB buffer, and effective population size set to 20,000. RICOPILI default filters for MAF and Info were removed since analyses were run across ancestry groups at this step. Imputed datasets were deposited with the PGC DAC and are available for approved requests.