Chunk #82 — The Methods — Variable Importance

Source: An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.
Embedded: yes

Text

In principle, a possible naive variable importance measure would be to merely count the number of times each variable is selected by all individual trees in the ensemble. More elaborate variable importance measures incorporate a (weighted) mean of the individual trees’ improvement in the splitting criterion produced by each variable (Friedman 2001). An example for such a measure in classification is the “Gini importance” available in random forest implementations. It describes the average improvement in the “Gini gain” splitting criterion that a variable has achieved in all of its positions in the forest. However, in many applications involving predictor variables of different types, this measure is biased, as outlined in section “Bias in variable selection and variable importance”.