For many association tests including candidate genes and replication studies for candidate chromosomal regions it is useful to identify smaller numbers of SNPs that can distinguish European substructure. Previous studies including our own utilized genome-wide SNP sets of ≤10K SNPs. To identify a more robust set of SNPs that could distinguish the largest component of substructure observed in the current data we used the genotypic differences observed in >300K SNPs between two groups of individuals, 150 Ashkenazi Jewish and 125 Northern European individuals. The Ashkenazi Jewish individuals were chosen since 1) this individual group was most clearly distinguishable from the Northern European individuals, 2) might more closely represent an “older” population of Mediterranean origin and 3) we had substantial number of genotyped individuals to enable a good representation of this population. To select the most informative SNPs distinguishing between these groups we determined the informativeness (In) [22] for each of >300K SNPs. The 20,000 SNPs with the highest In values were then selected to capture the most informative SNPs. To ensure both a more uniform genome-wide distribution and minimize linkage