paperKB
coga / coga-kb
Help
Sign in

Chunk #13 — Background — Random forest (RF) — Hyperparameters

Source
Random forest versus logistic regression: a large-scale benchmark experiment.
Embedded
yes

Text

This section presents the most important parameters for RF and their common default values as implemented in the R package randomForest [3] and considered in our study. Note, however, that alternative choices may yield better performance [16, 17] and that parameter tuning for RF has to be further addressed in future research. The parameter ntree denotes the number of trees in the forest. Strictly speaking, ntree is not a tuning parameter (see [18] for more insight into this issue) and should be in principle as large as possible so that each candidate feature has enough opportunities to be selected. In practice, however, performance reaches a plateau with a few hundreds of trees for most datasets [18]. The default value is ntree =500 in the package randomForest. The parameter mtry denotes the number of features randomly selected as candidate features at each split. A low value increases the chance of selection of features with small effects, which may contribute to improved prediction performance in cases where they would otherwise be masked by features with large effects. A high value of mtry