Chunk #15 — Results — Robustness and benchmarking analysis

Source: Dictionary learning for integrative, multimodal and scalable single-cell analysis.
Embedded: yes

Text

As our strategy relies on the ability for the dictionary to represent and reconstruct individual datasets, we explored how the size and composition of the multi-omic dataset affected the accuracy of integration. We sequentially downsampled the multi-omic dataset, repeated bridge integration, and compared the results with our original findings. Downsampling the bridge generally returned results that were concordant with the full analysis, but as expected, could affect annotation concordance for rare cell types which are most sensitive to downsampling (Fig. 3a). We found that if a bridge dataset contained at least 50 cells (‘atoms’) representing a given cell type, this was sufficient for robust integration. We note that this threshold is not a strict requirement; we found that integration can be successful for rare cell types such as ASDC even when fewer than ten cells are present in the bridge, but we also observed failure modes in this regime. We note that generating bridge datasets consisting of more than 50 cells per subpopulation is quite feasible for many multi-omic technologies, and that our findings represent guidelines to assist in experimental