Chunk #14 — Background — Random forest (RF) — Hyperparameters

Source: Random forest versus logistic regression: a large-scale benchmark experiment.
Embedded: yes

Text

each split. A low value increases the chance of selection of features with small effects, which may contribute to improved prediction performance in cases where they would otherwise be masked by features with large effects. A high value of mtry reduces the risk of having only non-informative candidate features. In the package randomForest, the default value is \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\sqrt {p}$\end{document}p for classification with p the number of features of the dataset. The parameter nodesize represents the minimum size of terminal nodes. Setting this number larger yields smaller trees. The default value is 1 for classification. The parameter replace refers to the resampling scheme used to randomly draw from the original dataset different samples on which the trees are grown. The default is replace =TRUE, yielding bootstrap samples, as opposed to replace =FALSE yielding subsamples— whose size is determined by the parameter sampsize.