Our quality control pipeline was designed specifically to accommodate the large-scale dataset of ethnically diverse participants, genotyped in many batches, using two slightly different arrays, and which will be used by many researchers to tackle a wide variety of research questions. Participants reported their ethnic background by selecting from a fixed set of categories14. Although most (94%) individuals report their ethnic background as within the broad-level group ‘white’, there are still approximately 22,000 individuals with a self-reported ethnic background originating outside Europe (Extended Data Table 3). We used approaches based on principal component analysis (PCA) to account for population structure in both marker and sample-based quality control (see Methods).