Chunk #22 — 2. Materials and Methods — 2.8. Random Forest Classification

Source: Differentiating Individuals with and without Alcohol Use Disorder Using Resting-State fMRI Functional Connectivity of Reward Network, Neuropsychological Performance, and Impulsivity Measures.
Embedded: yes

Text

form of test data only, to estimate the prediction accuracy of the RF model. While classification trees are grown for each bootstrap sample (which is approximately two-thirds of the training data), the OOB error rate is calculated for each classification tree built. According to Breiman [87], there are two reasons for using bagging: (i) to enhance the accuracy when random features are used; and (ii) to give ongoing estimates of the generalization error of the combined ensemble of trees, as well as estimates for the strength and correlation. The aggregate of OOB scores on all “ntree” trees (which is the maximum number of trees pre-set in the model calculation) provides the ensemble OOB error rate. Thus, the OOB score provides validation for the RF model. Therefore, unlike in other machine learning algorithms, random forests method does not require separate training data and test data while specifying the model term. In the current study, the maximum number of trees “ntree” was set at 1000. The optimal number of features analyzed at each node (“Mtry”) in the model was estimated to be 8 (using the “tuneRF” function). The final list of variables that significantly contributed to the classification was tabulated and sorted