The SNPRelate42 package in R was used for principal components analysis (PCA) (see Supplementary Information for further details). The relevant PCs were selected using scatter plots. Scatter plots, with various PCs on the x and y axes, helped to assess the spread of genetic ancestry in the data for self-identified racial/ethnic clusters. A parallel coordinate plots for the first 10 PCs was generated, in which each PAGE individual is represented by a set of line segments connecting his or her PC values. The amount of variance explained diminished with each subsequent PC, and we estimated that the top 10 PCs provided sufficient information to explain the majority of genetic variation in the PAGE study population.