A set of 47,000 variants genotyped on all arrays and meeting the following four criteria: common (defined as MAF > 10% in the combined sample), independent (defined as R2 < 0.5), high quality (missing rate < 2% and Hardy-Weinberg Equilibrium (HWE) P-values > 0.001), were used to assess duplicate samples included on multiple arrays and also to confirm the reported pedigree structure. Family structures were altered as needed, and genotypes were checked for Mendelian inconsistencies using Pedcheck 36 with the revised family structure. Genotype inconsistencies were set to missing. The same set of 47,000 variants was also employed to calculate principal components (PCs) using Eigenstrat 37 and 1000 Genomes (Phase 3, version 5). Based on the first two PCs, each individual was then assigned a race classification (AA, EA, and Other). To maximize the value of the multiplex family recruitment strategy of COGA, family-based analyses were performed. Families were assigned a family-based race, according to the majority of individual-based race in that family.