paperKB
coga / coga-kb
Processing
Help
Sign in

Chunk #28 — Specific analysis models — Multiple regression and data mining methods

Source
Statistical analysis strategies for association studies involving rare variants.
Embedded
yes

Text

Multiple regression models have been applied in many standard GWA studies in an effort to identify the most likely causal variants in a particular genomic region harboring many associated variants72, 73. However, their direct application via simple extensions of the methods described by Morris and Zeggini33 to the analysis of multiple individual rare variants or collapsed sets of variants may be problematic. For example, collapsed sets of variants might be correlated due to LD with an additional common variant included in the model or due to the manner in which different subsets of variants are collapsed based on functional annotations, as discussed previously in the context of the hierarchical nature of collapsing sets of variants based on functional annotations. Furthermore, strong multicollinearity is known to cause numerical and interpretation issues in traditional linear regression analysis. In addition, there will likely be many potential predictor variables to choose from if many individual common and rare variants, as well as collapsed sets of variants, are considered. Having many independent variables, or more independent variables than subjects, creates enormous potential for numerical instabilities and overfitting in standard linear regression models.