types were predicted as described above using the gene expression assay. We computed nearest neighbors in each reduced-dimension space using the RANN R package (https://CRAN.R-project.org/package=RANN). For each dimension reduction method, we used the first 20 components removing any dimensions with a correlation >0.9 with the total counts in each cell (dimension 1 for LSI (Signac) and dimension 2 for SnapATAC). For SCALE, an autoencoder-based method, we used the entire latent space (ten dimensions). As an additional performance metric, we computed the mean Silhouette score for each cell type, for each downsampling level, using the Silhouette function in the cluster package in R. For both the k-NN purity metric and the Silhouette score, we computed the mean score for each cell type. This prevented the metric being biased toward the performance of each dimension reduction method on the most abundant cell types in the dataset.