To further explore this issue over all 243 investigated datasets, we compute Spearman’s correlation coefficient between the difference in accuracy between random forest and logistic regression (Δacc) and various datasets’ meta-features. The results of Spearman’s correlation test are shown in Table 3. These analyses again point to the importance of the number p of features (and related meta-features), while the dataset size n is not significantly correlated with Δacc. The percentage Cmax of observations in the majority class, which was identified as influencing the relative performance of RF and LR in a previous study [39] conducted on a dataset from the field of political science is also not significantly correlated with Δacc in our study. Note that our results are averaged over a large number of different datasets: they are not incompatible with the existence of an effect in some cases. Table 3Correlation between Δacc and dataset’s featuresSpearman’s ρSpearman’s ρp-value n -0.03386.00·10−1 p 0.3311.32·10−7 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\frac {p}{n}$\end{document}pn 0.2546.39·10−5 d 0.2584.55·10−5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\frac {d}{n}$\end{document}dn 0.2461.04·10−4 p numeric 0.2546.09·10−5 p categorical -0.0762.37·10−1 p numeric,rate 0.2401.54·10−4 C max 0.007359.10·10−1