paperKB
coga / coga-kb
Help
Sign in

Chunk #67 — Online Methods — UK Biobank data set

Source
Fast and accurate long-range phasing in a UK Biobank cohort.
Embedded
yes

Text

We analyzed data from the UK Biobank, consisting of 152,729 samples typed at ≈800,000 SNPs. Using PLINK247) (see URLs), we removed 480 individuals marked for exclusion from genomic analyses based on missingness and heterozygosity filters, leaving 152,249 samples (see URLs, Genotyping and QC). We restricted the SNP set to autosomal, biallelic SNPs with MAF≥0.1% and missingness ≤5%, leaving 627K SNPs (26,695 on the short arm of chromosome 1, 31,090 on chromosome 10, and 16,367 on chromosome 20). We identified 72 trios based on IBS0<0.001, sex of parents, and age of trio members (see URLs, Genotyping and QC). Of the 72 trio children, 69 self-reported British ethnicity, one self-reported Indian ethnicity, and one self-reported Caribbean ethnicity. The remaining trio child did not self-report any ethnicity, but her parents self-reported Irish and “Any other white background” as their ethnicities. UK Biobank genotyping and QC analyses indicated that self-reported ethnicity aligned closely with genetic ancestry (see URLs); however, UK Biobank also curated a subset of 120,286 self-reported British samples recommended for GWAS.