This method performs an overrepresentation analysis, evaluating the significance for each category of genes empirically. This method can be applied to both unrelated and family based datasets. It selects the set of genes, of size n, which are tagged by SNPs located within gene sequences or in the 20 kb up/downstream flanking these gene regions, which are more significant than a specific threshold (i.e., <0.001; 0.005; 0.01; and 0.05). The association p-value is estimated using standard GWAS methods, and is detailed in Text S3. A pruning process that eliminates SNPs in linkage disequilibrium is performed by considering only the most significant SNP among all of the SNPs that have r2>0.2 and are within 1 Mb. If one SNP tags more than one gene, all of these genes are included as significant. Although more than one SNP in linkage equilibrium (r2<0.2) might tag a gene, each gene is counted only once. The statistical significance of the overrepresentation of each set of genes (category-specific p-value) is calculated by comparing the number of significant genes to the number of genes expected by chance.