Chunk #0 — 1 Introduction

Source: RVTESTS: an efficient and comprehensive tool for rare variant association analysis using sequence data.
Embedded: yes

Text

Next-generation sequencing and microarray genotyping have enabled the cost-effective interrogation of a full spectrum of sequence variants. There is considerable interest in understanding the functional role of rare and low-frequency variants in the etiology of complex diseases. Efficient software programs for sequence-based association analysis are in great demand. However, several computational challenges must first be addressed: first, to determine the optimal analysis units and prioritize causal variants in sequence-based association analyses, multiple sources of information need to be integrated, including annotations, functional prediction scores and others. For example, simulation studies show that it is beneficial to analyze non-synonymous variants in a gene-level association test (Kryukov et al., 2009) and that the power can be improved by incorporating functional prediction scores, if the scores are correlated with variant causality (Byrnes et al., 2013). An automatic pipeline is needed to integrate this information and facilitate association analyses. Second, whole-genome datasets of many thousands of individuals often contain tens of millions of variants, which can be >100 GB in size even after compression—orders of magnitude larger than a typical array-based GWAS. Developing efficient tools that can scale for these datasets is a critical yet daunting task.