Chunk #25 — Specific analysis models — Approaches based on similarities among individual sequences

Source: Statistical analysis strategies for association studies involving rare variants.
Embedded: yes

Text

In the absence of knowledge of which rare variants to collapse or consider as a set, one could potentially search for a subset of variants that maximally discriminates between, for example cases and controls, based on the distances between the sequences in the two groups.66 Permutation methods could be used to derive p-values for discriminative ability. Searches for optimal sets of variations in this manner have parallels to the approach underlying logic regression67 and the method of Han and Pan40, which are discussed later in the section on regression methods. Although intuitively appealing, such methods are problematic in that the determination of an optimal subset of variants based on group differences can be computationally-intensive. In addition, if a large enough genomic region is considered, then one could merely ‘collapse’ all variants unique to each case and then unique to each control, resulting in a set of variants that completely and perfectly discriminate cases from controls. The possibility of this phenomenon emphasizes a need for considering functional annotations in relevant data analyses or other ways of circumscribing rare variants to be considered as a collapsed set.