Chunk #37 — Results — Included datasets

Source: Random forest versus logistic regression: a large-scale benchmark experiment.
Embedded: yes

Text

From approximately 20000 datasets currently available from OpenML [26], we select those featuring binary classification problems. Further, we remove the datasets that include missing values, the obviously simulated datasets as well as duplicated datasets. We also remove datasets with more features than observations (p>n), and datasets with loading errors. This leaves us with a total of 273 datasets. See Fig. 2 for an overview.