When scoring gene sets, several sources of potential bias need to be considered: Linkage disequilibrium patterns. Because markers in high LD may originate from a single association signal, an effective strategy may involve down-weighting P-values from regions with high LD compared to regions with relatively independent association signals. To this end, strategies have been proposed to group markers in high LD as a “proxy cluster” [8] or use LD blocks from the HapMap database as units of analysis [47] and then assign a single P-value for each cluster or LD block.Overlapping genes. Another related and potentially serious problem may result from overlapping genes. When several functionally related genes in a gene set are clustered locally, careful attention should be paid to the SNPs mapped to overlapping genes. When selecting one or more of the most significant SNPs to represent each gene, gene set significance may be driven by only a few of these SNPs, because the significant SNPs mapped to multiple genes could be included multiple times. For example, in our analysis of the GAIN schizophrenia dataset [11], the "starch