Chunk #22 — Statistical methods for the analysis of rare variants

Source: Exome sequencing and the genetic basis of complex traits.
Embedded: yes

Text

An important consideration for exome sequencing studies is selecting the significance threshold that accounts for multiple testing. A simple way is to adopt a Bonferroni correction for 20,000 independent tests (one test per each gene), which, for an experiment- wide significance of 0.05 gives a p-value threshold of 2.5 × 10−6 per gene. However, such a threshold may be overly conservative because it assumes that each tested gene has sufficient variation to achieve the asymptotic properties for the test statistic. For example, if only 2 individuals carry non-synonymous variants in a given gene, the difference between cases and controls never exceeds 2 total observations, and so the most significant p-value that can be achieved is around 0.25 assuming that these 2 variants are independent. Therefore, unless the study is large, association p-values will be generally less significant than expected under the null hypothesis. Figure 2a demonstrates this effect on the 438 whole exomes. The PLINK/SEQ suite computes from data the so-called i-stat, which is an estimate of the minimal achievable p-value for a gene. The i-stat can be used by