Chunk #114 — Features and Pitfalls — “Small n Large p” Applicability

Source: An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.
Embedded: yes

Text

The fact that variable selection can be limited to random subsets in random forests make them particularly well applicable in “small n large p” problems with many more variables than observations, and has added much to the popularity of random forests. However, even if the set of candidate predictor variables is not restricted as in random forests, but covers all predictor variables as in bagging, the search is only a question of computational effort: Unlike logistic regression models, e.g., where parameter estimation is not possible (for instance, due to linear constraints in the predictors or perfect separation of response classes in some predictor combinations) when there are too many predictor variables and too few observations, tree-based methods like bagging and random forests only consider one predictor variable at a time, and can thus deal with high numbers of variables sequentially. Therefore Bureau et al. (2005) and Heidema, Boer, Nagelkerke, Mariman, van der A, and Feskens (2006) point out that the recursive partitioning strategy is a clear advantage of random forests as opposed to more common methods like logistic regression in high dimensional settings.