Chunk #27 — Methods — Samples selection from Biobanks

Source: Ancestry deconvolution and partial polygenic score can improve susceptibility predictions in recently admixed individuals.
Embedded: yes

Text

the UKBB workgroup, thus selecting individuals which were (a) closer than 5 and (b) farther than 15 cumulated standard deviations with respect to the UKBB GWAS training set: this defined a genetically “European” and a “non-European” sample sets (Supplementary Fig. 4a). The “European” set was randomly downsampled to 5000, defining “UK EUR”. We performed a preliminary global ancestry analysis with ADMIXTURE32 at k=5 (Supplementary Fig. 4b) on the “non-European” set in order to select LAD sources, using only markers available from chromosome 1 and projecting the samples onto the allele frequencies (P file) obtained by running the 2305 1000 Genomes Project samples26 under the same parameters. We also discarded samples which showed cumulative South Asian or Native American ancestry higher than 20%. We further segmented genetically non-European samples with the global proportions obtained from the LAD by assuming a threshold of 5% in order to define presence/absence of a certain ancestry in an individual, thus defining “UK AFR”, “UK EAS”, “UK EURAFR”, “UK EUREAS”, “UK EUREASAFR”. “UK FAREUR”, most likely composed by South Europeans and West Asian individuals, was defined by downsampling to 5000 a group of samples inferred to be more than 95% European by LAD but coming from