paperKB
coga / coga-kb
Help
Sign in

Chunk #134 — Features and Pitfalls — Tests for Variable Importance and Variable Selection

Source
An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.
Embedded
yes

Text

This approach may appear more statistically advanced than a merely descriptive usage of the random forest variable importance. However, it shows such alarming statistical properties that any statement of significance made with this test is nullified (Strobl and Zeileis 2008): The power of this test depends on the number of trees in the ensemble ntree, over which the importance is averaged (cf. Equations 2 and 3 in section “Variable importance”; see also Lunetta et al. 2004). Thus reporting the significance of variable importance scores (like, e.g., Baca-Garcia et al. 2007, who do not even report the parameter settings they use for fitting the random forest) can be highly misleading, because the number of variables whose scores exceed a given threshold for significance depends on the arbitrary choice of a tuning parameter.