Chunk #26 — Methods — Samples selection from Biobanks

Source: Ancestry deconvolution and partial polygenic score can improve susceptibility predictions in recently admixed individuals.
Embedded: yes

Text

To test T2D, height, BMI in EstBB we used the same set of 1923 individuals with 942 cases of diabetes, obtained by removing samples used in training T2D PS in Läll et al.28 and samples genotyped by sequencing. To test breast cancer with PS from Michailidou et al.30 we used a set of 908 women including 308 cases, removing prevalent samples used in training PS in Michailidou et al.30. All these sets were filtered removing samples with relatedness of 2nd degree and higher. To test predictivity in UKBB we first removed all samples with relatedness of 3rd degree and higher, those for which relatedness could not be computed and those present in the UKBB GWAS training set29. Then we used a method adapted from Neale Lab (https://github.com/Nealelab/UK_Biobank_GWAS) to draw ellipses in the space defined by first 6 PCs pre-computed by the UKBB workgroup, thus selecting individuals which were (a) closer than 5 and (b) farther than 15 cumulated standard deviations with respect to the UKBB GWAS training set: this defined a genetically “European” and a “non-European” sample sets (Supplementary Fig.