paperKB
coga / coga-kb
Help
Sign in

Chunk #125 — Features and Pitfalls — Missing Value Handling

Source
An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.
Embedded
yes

Text

After a splitting variable is selected it would be unclear to what daughter node the observations that have a missing values in this variable should be assigned. Therefore a surrogate variable is selected, that best predicts the values of the splitting variable. By means of this surrogate variable the observations can then be assigned to the left or right daughter node (cf., e.g., Hastie et al. 2001). A flaw of this strategy is, however, that currently the permutation variable importance measure is not defined for variables with missing values.