Single-site classification performance with 10-fold CV varied greatly, with AUCs ranging between 0.30–0.89 across different sites and classifiers (see supplementary Table S5). Figure 2 summarizes RFC performances for each individual site. We assessed the correlation between the number of participants in each site and its obtained classification performance (AUC averaged over CV folds), which was significant (rS = 0.37, p = 0.014). In addition, we investigated the relationship between single-site classification performance and the following clinical variables of interest: mean and standard deviation of AO, duration, severity and the proportion of medicated patients and its standard deviation. None of these clinical variables showed a significant correlation with classification performance.