The success of atomic sketch integration rests on identifying a representative subset of cells for each dataset. Sketching techniques for single-cell analysis aim to find subsamples that preserve the overall geometry of these datasets54-56. These methods do not require a pre-clustering of the data, but aim to ensure that the sketched dataset represents both rare and abundant cell states, even after downsampling. Here, we perform sketching using a leverage-score sampling based strategy that has been proposed for large-scale information retrieval problems57, can be rapidly and efficiently computed on sparse datasets. Leverage-score based sampling does not require performing principal components analysis, but maintains the ability to efficiently identify cells from rare subpopulations compared to geometric sketching techniques54 (Supplementary Fig. 5a,b). We emphasize that atomic sketch integration represents a general strategy for improving scalability that can be broadly coupled with existing methods. For example, a wide variety of integration techniques - including Harmony38, Scanorama 40, mnnCorrect39, scVI41, and Seurat19, can be used to integrate the atom elements in each dictionary, with our procedure then enabling these results to be extended to full datasets.