paperKB
coga / coga-kb
Help
Sign in

Chunk #37 — Results — Included datasets

Source
Random forest versus logistic regression: a large-scale benchmark experiment.
Embedded
yes

Text

From approximately 20000 datasets currently available from OpenML [26], we select those featuring binary classification problems. Further, we remove the datasets that include missing values, the obviously simulated datasets as well as duplicated datasets. We also remove datasets with more features than observations (p>n), and datasets with loading errors. This leaves us with a total of 273 datasets. See Fig. 2 for an overview.