LSI involves two steps. First, we compute the TF-IDF matrix from the count matrix. Term frequency was defined as TF = Cij/Fj where Cij was the total number of counts for peak i in cell j and Fj was the total number of counts for cell j. IDF was defined as IDF = N/ni, where N was the total number of cells in the dataset and ni was the total number of counts for peak i across all cells. The TF-IDF matrix was then computed as TF-IDF = log(1 + (TF × IDF) × 104). For comparison with alternative LSI methods4, we also computed IDF as IDF = log(1 + N/ni) and subsequently TF-IDF as TF × IDF (for ‘Cusanovich2018’) and TF-IDF as log(TF) × IDF (for ‘log-TF’; http://andrewjohnhill.com/blog/2019/05/06/dimensionality-reduction-for-scatac-data/). This was implemented in the RunTFIDF function in Signac, with the ‘method’ argument used to choose the TF-IDF method used. We decomposed the resulting TF-IDF matrix via truncated singular value decomposition using the irlba R package (https://cran.r-project.org/package=irlba)60, implemented in the RunSVD function in Signac.