We reasoned that dictionary learning could also enable efficient and large-scale integrative analysis. We first select a representative sketch of cells (i.e. 5,000 cells) from each dataset, and treat these cells as atoms in a dictionary (Fig. 4a, Supplementary Methods). We next learn a dictionary representation, representing a weighted linear combination of atoms that can reconstruct the full dataset. These steps can occur for each dataset independently, allowing for efficient processing. We then perform integration on the atoms from each dataset. This is the only step that simultaneously analyzes cells from multiple datasets, but since only the atoms are considered, this does not impose scalability challenges. Finally, we apply our previously learned dictionary representations to the harmonized atoms from each dataset individually, and reconstruct harmonized profiles for the full dataset. We refer to this procedure as ‘atomic sketch integration’. We highlight that for this application, the ‘atoms’ used to reconstruct a dataset represent a subset of cells from the dataset itself. Contrastingly, in bridge integration, the atoms refer to cells from a different (multi-omic) dataset.