Chunk #123 — Features and Pitfalls — Out-of-Bag Error Estimation

Source: An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.
Embedded: yes

Text

It was already mentioned, and used in the application example, that bagging and random forests come with their own “built-in” test sample, the out-of-bag observations, that provide a fair means of error estimation (Breiman 1996b). Of course similar validation strategies, based either on sample splitting or resampling techniques (see, e.g., Hothorn, Leisch, Zeileis, and Hornik 2005; Boulesteix et al. 2008) or ideally even external validation samples (König, Malley, Weimar, Diener, and Ziegler 2007), can and should be applied to any statistical method. However, in many disciplines intensive model validation is not common practice. Therefore a method that comes with a built-in test sample, like random forests, may help sensitize for the issue and relieve the user of the decision for an appropriate validation scheme.