Chunk #11 — Method — Feature selection and classification model estimation

Source: Predicting risk for Alcohol Use Disorder using longitudinal data with multimodal biomarkers and family history: a machine learning study.
Embedded: yes

Text

A linear-kernel SVM was trained to distinguish between the two groups in a 10-fold cross-validation (CV) procedure that included parameter optimization. For the 10-fold CV, subjects were randomly divided into ten equal groups, a classifier was then trained on nine of the ten groups and tested on the left-out one. Every fold, the entire dataset was shuffled to insure randomization of the groups. Due to the effect of the random division on the classification results we repeated this process 10 times, averaging the output results. To evaluate model performance, we recorded the number of true positives (TP, number of correctly classified AUD) and true negatives (TN, number of correctly classified controls) scores. Classification accuracy was computed as a ratio of sum of TP and TN divided by the sum of all classified subjects. Area under curve (AUC)3 and F-scores were used to evaluate the classification models, while F was defined by the equation 10, 11, 39 and can be interpreted as a weighted harmonic average of precision and recall values 39.The precision is defined as the number of true positives