Chunk #16 — Background — Random forest (RF) — Variable importance measures

Source: Random forest versus logistic regression: a large-scale benchmark experiment.
Embedded: yes

Text

As a byproduct of random forests, the built-in variable importance measures (VIM) rank the variables (i.e. the features) with respect to their relevance for prediction [2]. The so-called Gini VIM has shown to be strongly biased [14]. The second common VIM, called permutation-based VIM, is directly based on the accuracy of RF: it is computed as the mean difference (over the ntree trees) between the OOB errors before and after randomly permuting the values of the considered variable. The underlying idea is that the permutation of an important feature is expected to decrease accuracy more strongly than the permutation of an unimportant variable.