Chunk #69 — The Methods — How Do Ensemble Methods Work? — Random Forests

Source: An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.
Embedded: yes

Text

Besides intuitive explanations for “how ensemble methods work”, recent publications have contributed to a deeper understanding of the statistical background behind many machine learning methods: The work of Bühlmann and Yu (2002) provided the statistical framework for bagging, Friedman, Hastie, and Tibshirani (2000) and Bühlmann and Yu (2003) for the related method boosting and, most recently, Lin and Jeon (2006) and Biau et al. (2008) for random forests. In their work Lin and Jeon (2006) explore the statistical properties of random forests by means of placing them in a k-nearest neighbor (k-NN) framework, where random forests can be viewed as adaptively weighted k-NN with the terminal node size determining the size of neighborhood. However, in order to be able to mathematically grasp a computationally complex method like random forests, involving several degrees of random sampling, several simplifying assumptions are necessary. Therefore well planned simulation studies still offer valuable assistance for evaluating statistical aspects of the method in its original form.