Chunk #134 — Features and Pitfalls — Tests for Variable Importance and Variable Selection

Source: An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.
Embedded: yes

Text

This approach may appear more statistically advanced than a merely descriptive usage of the random forest variable importance. However, it shows such alarming statistical properties that any statement of significance made with this test is nullified (Strobl and Zeileis 2008): The power of this test depends on the number of trees in the ensemble ntree, over which the importance is averaged (cf. Equations 2 and 3 in section “Variable importance”; see also Lunetta et al. 2004). Thus reporting the significance of variable importance scores (like, e.g., Baca-Garcia et al. 2007, who do not even report the parameter settings they use for fitting the random forest) can be highly misleading, because the number of variables whose scores exceed a given threshold for significance depends on the arbitrary choice of a tuning parameter.