Chunk #21 — 2. Materials and Methods — 2.8. Random Forest Classification

Source: Differentiating Individuals with and without Alcohol Use Disorder Using Resting-State fMRI Functional Connectivity of Reward Network, Neuropsychological Performance, and Impulsivity Measures.
Embedded: yes

Text

to see if any of the top variables had associations with age in the individual groups or in total sample (see Section 3.4); and (ii) since the age difference between the groups was highly significant, including age as a feature in the classification model would likely artificially increase the accuracy of classification, which would not be desirable. To compute the classification accuracy of the random forest model, we used an out-of-bag (OOB) error estimate. According to Breiman and Cutler [86], in random forests, owing to the inbuilt OOB feature in the model, additional cross-validation is not a requirement to obtain an unbiased estimate of the test sample error, since it is estimated internally in the OOB algorithm. The random forest algorithm constructs each of the decision trees using separate bootstrap subsamples from the training data, and about one-third of the observations from the training data are left out during each bootstrap, called the OOB sample, which are used as a form of test data only, to estimate the prediction accuracy of the RF model. While classification trees are grown for each bootstrap sample (which is approximately two-thirds of the training data), the OOB error rate is calculated for each classification