For most of the case–control cohort analyses, we restricted the statistical tests to include a homogeneous European genetic ancestry test cohort. We predicted genetic ancestries from the exome data using peddy v0.4.2 with the ancestry labelled 1,000 Genomes Project as reference. 55. Of the 287,917 UKB sequences, 18,212 (6.3%) had a Pr(European) ancestry prediction of less than 0.99. Focusing on the remaining 269,706 UKB participants, we further restricted the European ancestry cohort to those within ±4 s.d. across the top four principal component means. This resulted in the exclusion of an additional 535 (0.2%) outlier participants. In total, there were 269,171 predominantly unrelated participants of European ancestry who were included in our European case–control analyses. We also used peddy-derived ancestry predictions to perform case–control PheWAS within non-European populations where there were at least 1,000 exome-sequenced individuals available (see the section ‘Collapsing analyses’). Through this step, we identified and used 4,744 (Pr(African) > 0.95), 1,475 (Pr(East Asian) > 0.95) and 5,714 (Pr(South Asian) > 0.95) UKB participants for ancestry-independent collapsing analyses.