Chunk #106 — III. Selected Methodological Issues — C. Analyses — 1. Achieving significant genome wide association in single samples vs seeking replication and generalization in multiple samples
The criterion used here identifies clustering based on chromosomal position. This approach allows direct comparison between datasets that assess different sets of SNPs in samples that may well differ in the details of their patterns of linkage disequilibrium. The Monte Carlo simulation methods used here do not make assumptions about the underlying distribution of the data assessed. Monte Carlo methods provide empirical p values based on repeated random samples from the actual datasets analyzed. Such approaches are especially useful when we seek to assess the significance of apparently-reproducible results from convergent data from multiple independent datasets which differ from each other in “n”, number and types of genomic markers, racial/ethnic background of the subjects and other key features. No alternative method of which we are aware provides as tractable a method for assessing the significance of results obtained in multiple samples without assumptions about underlying distributions of the data as do Monte Carlo approaches. We use 10,000 Monte Carlo trials in circumstances in which moderately-high significance is anticipated, and 100,000 trials in circumstances in which extremely-high significance is anticipated.