Chunk #12 — Results — Evaluation of scATAC-seq dimension reduction methods.

Source: Single-cell chromatin state analysis with Signac.
Embedded: yes

Text

LSI was originally developed for natural language processing47 and uses a term frequency-inverse document frequency (TF-IDF) weighting scheme to weight features according to their frequency in a document and their frequency across all documents in a text corpus. LSI has since been applied for the analysis of single-cell chromatin data, where a cell is analogous to a document and a term is analogous to a genomic region4. The most popular TF-IDF method applied to single-cell chromatin data computes the term frequency as TF = Cij/Fj where Cij is the total number of counts for peak i in cell j and Fj is the total number of counts for cell j. The inverse document frequency is typically computed as IDF = log(1 + N/ni) where N is the total number of cells in the dataset and ni is the total number counts for peak i across all cells. The TF-IDF matrix is then computed as TF × IDF. We found that, when applied to scATAC-seq data, this implementation often results in nonzero values in the TF-IDF matrix having low variance and