An approach to address the statistical problem of the large number of tests is to use a second, independent dataset. Although filtering may provide considerable help with the computation challenges, the number of tests will likely remain very large and the effects to be detected relatively modest. As a result, the issue of statistical significance will likely remain even after judicious a priori filtering of predictors. One way to address this is to use the initial data for hypothesis generation and an independent dataset to test a small number of hypotheses. If the key biological contributors to the phenotype are identified in the initial data, we would expect that the signal would replicate in an ethnically matched, independent replication sample.