paperKB
coga / coga-kb
Help
Sign in

Chunk #7 — Online methods — Imputation analysis

Source
Large-scale association analysis identifies new lung cancer susceptibility loci and heterogeneity in genetic susceptibility across histological subtypes.
Embedded
yes

Text

A detailed description of the imputation procedures used by the OncoArray consortium and in this Lung Oncoarray project, has been described previously.5 Briefly, the reference Dataset was the 1000 Genomes Project (GP) Phase 3 (Haplotype release date October 2014). The forward alignment of SNPs genotyped on the Oncoarray was confirmed by blasting the sequences used for defining SNPs against the 1000 Genomes. Any ambiguous SNPs were subjected to a frequency comparison to 1000 Genomes variants. Allele frequencies were calculated from a large collection of control samples from Europeans (from 108,000 samples) and Asians (11,000 samples). A difference statistic is calculated by the formula: (|p1-p2|- 0.01)2/((p1+p2)(2-p1-p2)) where p1 and p2 are the frequencies our dataset and in the 1000 genomes respectively5. A cutoff of 0.008 in Europeans and 0.012 in Asians is needed to pass. SNPs where the frequency would match if the alleles were flipped were excluded from imputation, but not from the association analyses.5 AT/GC SNPs were not present in previously genotyped lower density arrays. Because all imputation was performed to the same standard all SNPs had the same