Chunk #22 — 2. Methods — 2.7. Random Forest Classification Model and Parameters

Source: Random Forest Classification of Alcohol Use Disorder Using fMRI Functional Connectivity, Neuropsychological Functioning, and Impulsivity Measures.
Embedded: yes

Text

The Random Forest classification model included 66 DMN connections, 13 neuropsychological scores, and 4 BIS scores as features, while the group status (AUD and CTL) served as the outcome variable. The training data consisted of a full sample for identifying significant features for classifying the groups. To compute prediction error and classification accuracy, we used the Out-of-Bag (OOB) error method. According to Breiman and Cutler [85], in random forests, there is no need for cross-validation or a separate test sample to get an unbiased estimate of the test sample error, which is estimated internally in the algorithm. Each decision/classification tree is constructed using a different bootstrap sample from the training data (due to random selection), and about one-third of the observations from the training data are left out during each bootstrap, called the out-of-bag sample, which will be used only to estimate the prediction accuracy of the RF model. While classification trees are grown for each bootstrap sample (which is approximately two-thirds of the training data), the OOB error rate is calculated for each classification tree being built. The aggregate