Chunk #50 — Results — Explaining differences: datasets’ meta-features — Subgroup analyses: substantive context

Source: Random forest versus logistic regression: a large-scale benchmark experiment.
Embedded: yes

Text

Furthermore, we conduct additional subgroup analyses focusing on the subgroup of datasets from the field of biosciences/medicine. Out of the 243 datasets considered so far, 67 are related to this field. The modified versions of Figs. 3 and 5 and Table 2 (as well as Fig. 6 discussed in “Meta-learning” section) obtained based on the subgroup formed by datasets from biosciences/medicine are displayed in Additional file 2. The outperformance of RF over LR is only slightly lower for datasets from biosciences/medicine than for the other datasets: the difference between datasets from biosciences/medicine and datasets from other fields is not significantly different from 0. Note that one may expect bigger differences between specific subfields of biosciences/medicine (depending on the considered prediction task). Such investigations, however, would require subject matter knowledge on each of these tasks. They could be conducted in future studies by experts of the respective tasks; see also the “Discussion” section.