respectively). SVM and LR classifiers were combined with and without automatic dimensionality reduction via principal component analysis (PCA), using the minimal number of components explaining 90% of the variance. Hyper-parameters for SVM (linear and non-linear), LR and XGB were optimized using nested cross-validation; RFC and GPC were tuned following recommendations. Details on handling missing data, model implementation and hyper-parametrization can be found in Supplementary Methods (and supplementary Fig. S1 for assessment of different imputation strategies). The primary performance metric was the area under the receiver operator curve (AUC) and reported metrics are averaged across CV folds33. Balanced accuracy, sensitivity and specificity are reported in the supplement.