Chunk #143 — Features and Pitfalls — Do Random Forests Overfit?

Source: An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.
Embedded: yes

Text

The study referred to in Breiman (2001b), where it is stated (and has been extensively cited ever since) that random forests do not overfit, may be a prominent example for a premature conclusion drawn from an unrepresentative sample. A variety of studies exploring the characteristics of machine learning tools such as random forests are based on only a few, real data sets, that happen to be freely available in some data repository. The particular data sets investigated by Breiman (2001b) seem to enhance the impression that random forests would not overfit, but this notion is heavily criticized by Segal (2004).