paperKB
coga / coga-kb
Help
Sign in

Chunk #146 — Features and Pitfalls — Do Random Forests Overfit?

Source
An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.
Embedded
yes

Text

While most previous publications have argued that in an ensemble each individual tree should be grown as large as possible and that trees should not be pruned, the recent results of Lin and Jeon (2006) also show that creating large trees is not necessarily the optimal strategy. In problems with a high number of observations and few variables a better convergence rate (of the mean squared error as a measure of prediction accuracy) can be achieved when the terminal node size increases with the sample size (i.e. when smaller trees are grown for larger samples). On the other hand for problems with small sample sizes or even “small n large p” problems, growing large trees usually does lead to the best performance.