Chunk #4 — Introduction

Source: Random forest versus logistic regression: a large-scale benchmark experiment.
Embedded: yes

Text

Comparison studies published in literature often include a large number of methods but a relatively small number of datasets [5], yielding an ill-posed problem as far as statistical interpretation of benchmarking results are concerned. In the present paper we take an opposite approach: we focus on only two methods for the reasons outlined above but design our benchmarking experiments in such a way that it yields solid evidence. A particular strength of our study is that we as authors are equally familiar with both methods. Moreover, we are “neutral” in the sense that we have no personal priori preference for one of the methods: ALB published a number of papers on RF, but also papers on regression-based approaches [6, 7] and papers pointing to critical problems of RF [8–10]. Neutrality and equal expertise would be much more difficult if not impossible to ensure if several variants of RF (including tuning strategies) and logistic regression were included in the study. Further discussions of the concept of authors’ neutrality can be found elsewhere [5, 11].