paperKB
coga / coga-kb
Help
Sign in

Chunk #25 — The Methods — How Do Classification and Regression Trees Work? — Splitting and Stopping

Source
An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.
Embedded
yes

Text

For selecting the splitting variable and cutpoint, both CART and C4.5 follow the approach of impurity reduction, that we will illustrate by means of our smoking data example: In Figure 2 the relative frequencies of both response classes are displayed not only for the terminal nodes, but also for the inner nodes of the tree previously presented in Figure 1. Starting from the root node, we find that the relative frequency of “yes” answers in the entire sample of 200 adolescents is approximately 40%. By means of the first split, the group of 92 adolescents with the lowest frequency of “yes” answers (approximately 15%, node 2) can be isolated from the rest, that have a higher frequency of “yes” answers (almost 60%, node 3). These 108 subjects are then further split to form two groups: one smaller group with a medium (approximately 30%, node 4) and one larger group with a high (more than 60%, node 5) frequency of “yes” answers to the intention to smoke question.