44,644 SNPs on the microarray and 135 individuals with call rates < 98% were excluded, and 62,076 additional SNPs were removed due to minor allele frequencies (MAF) <1%. After data cleaning and quality control, 5,697 individuals and 889,659 SNPs remained for imputation. We identified several instances where identical DNA marker profiles were linked to two different interview forms. When demographic information (sex, date of birth, number of reported children) was consistent across interviews, one sample was randomly removed from analysis; when demographic information was inconsistent, both were removed. Genetic relationships were examined in the family-based sample by calculating pairwise identity by descent (IBD) proportion estimates using PLINK (12). Pairs of individuals whose IBD proportions did not match their reported genetic relationship were assigned to two different families and pairs of individuals who shared more than 25% of their alleles IBD were assigned to the same family. Self-reported males with X chromosome heterozygosity > 20% and self-reported females with heterozygosity < 20% were excluded unless their true identity could be determined.