paperKB
coga / coga-kb
Help
Sign in

Chunk #23 — Statistical methods for the analysis of rare variants

Source
Exome sequencing and the genetic basis of complex traits.
Embedded
yes

Text

the null hypothesis. Figure 2a demonstrates this effect on the 438 whole exomes. The PLINK/SEQ suite computes from data the so-called i-stat, which is an estimate of the minimal achievable p-value for a gene. The i-stat can be used by setting a threshold (e.g., 10−3) and only correcting for the number of genes that have the i-stat below the threshold following the idea that for the genes with i-stat above the threshold there is no power to find an association. Another way to correct for multiple testing is to compute an experiment-wide significance threshold by permutations of phenotype labels, create the empirical distribution of minimal p-values for all genes across permutations, and compare the minimal p-value from the real data to that distribution (Figure 2b). This approach efficiently controls Type-I error and is less conservative than the Bonferroni correction. Importantly, the p-value threshold computed by permutations is dependent on both the study and on the statistical test. However, the experiment-wide correction via permutation is not robust to confounding and it is essential to assess the quality of the distribution of test statistics, for those genes that have i-stats less than the threshold, to ensure appropriate calibration of the distribution. Nevertheless,