Chunk #125 — Features and Pitfalls — Missing Value Handling

Source: An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.
Embedded: yes

Text

After a splitting variable is selected it would be unclear to what daughter node the observations that have a missing values in this variable should be assigned. Therefore a surrogate variable is selected, that best predicts the values of the splitting variable. By means of this surrogate variable the observations can then be assigned to the left or right daughter node (cf., e.g., Hastie et al. 2001). A flaw of this strategy is, however, that currently the permutation variable importance measure is not defined for variables with missing values.