There are at least three aspects of this approach where we see significant scope for improvement. The first relates to missed variant calls, either due to low coverage or because some variants are not identified easily with current sequencing platforms (e.g. within repeat tracts in coding sequences). The second is that our filtering relied on a public SNP database (dbSNP) that is a highly uneven ascertainment of variation across the genome. It would be better to rely on catalogues of common variation that are ascertained in a single study either exome-wide (as with the 8 HapMap exomes2) or genome-wide (e.g. as with the 1000 Genomes project), and where estimates of allele frequency are available. Increasing the number of control exomes progressively reduces the relevance of dbSNP to this analysis (Supplementary Figure 2). Furthermore, as increasingly deep catalogs of polymorphism become available, it may be necessary to establish frequency-based thresholds for defining “common” variation that is unlikely to be causal. A third concern is that the specificity of this approach is currently reduced by a subset of genes that recurrently appear