Chunk #50 — Methods — UK Biobank genetic data

Source: Polygenic prediction via Bayesian regression and continuous shrinkage priors.
Embedded: yes

Text

The genetic data for the UK Biobank comprises 488,377 samples and was phased and imputed to ~96 million variants with the Haplotype Reference Consortium (HRC) haplotype resource and the UK10K + 1KG reference panel. We leveraged the QC metrics provided by the UK Biobank14 and removed samples that had mismatch between genetically inferred sex and self-reported sex, high genotype missingness or extreme heterozygosity, sex chromosome aneuploidy, and samples that were excluded from kinship inference and autosomal phasing. We further restricted the analysis to unrelated white British participants. We conducted simulation studies using 819,941 HapMap3 SNPs after removing ambiguous (A/T and C/G) SNPs and markers with minor allele frequency (MAF) <1%, missing rate >1%, imputation quality INFO score <0.8, and significant deviation from Hardy-Weinberg equilibrium (HWE) with P < 1 × 10−10. All genetic analyses in the UK Biobank were conducted using PLINK 1.954 [https://www.cog-genomics.org/plink/1.9].