Chunk #14 — Ancestral diversity and cryptic relatedness

Source: The UK Biobank resource with deep phenotyping and genomic data.
Embedded: yes
Text

The genotype data provide a unique opportunity to study the diverse ancestral origins (Extended Data Table 3) of UK Biobank participants. Accounting for the ancestral background is essential both for epidemiological studies and genetic analyses, such as GWAS19. We used PCA to measure population structure within the UK Biobank cohort (see Methods). Figure 3a shows results for the first four principal components plotted in consecutive pairs (see also Extended Data Fig. 3 and Supplementary Figs. 6, 7). As expected, individuals with similar principal component scores have similar self-reported ethnic backgrounds. For example, the first two principal components separate out individuals with sub-Saharan African ancestry, European ancestry and east Asian ancestry. Individuals who self-report as mixed ethnicity tend to fall on a continuum between their constituent groups. Further principal components capture population structure at sub-continental geographic scales (Extended Data Fig. 3). Our PCA revealed population structure within the most common ethnic background category (88.26%), ‘British’ within the broader-level group ‘white’ (Supplementary Fig. 8). We used a combination of self-reported ethnic background and PCA results to provide researchers with a list of 409,728 individuals (84%) who have very similar ancestral backgrounds relative to the full cohort (see Methods).