We further assessed the preservation of local cell neighborhoods in each downsampled dataset by computing the average fraction of k-nearest neighbors (k-NN) (k = 100, additional values of k are shown in Supplementary Fig. 5) for each cell belonging to the same cell type as the query cell (mean k-NN purity per cell type), as well as the average Silhouette score for each cell type, with cell types annotated using the independent gene expression assay. This revealed a gradual decline in local structure preservation as fewer counts were retained from the original dataset, with a greater decline seen when using the original LSI method, SnapATAC and cisTopic (Fig. 3c,d). To test how these results generalize to other datasets, we repeated a similar analysis using a series of synthetic scATAC-seq human bone marrow cell datasets generated in a recent benchmarking study52, with similar results (Supplementary Fig. 6). These results indicate that LSI, when applied with the right TF-IDF method, can be a powerful dimension reduction technique for single-cell DNA accessibility data.