Chunk #73 — STAR Methods — QUANTIFICATION AND STATISTICAL ANALYSIS — ICA based analysis and clustering

Source: Molecular Diversity and Specializations among the Cells of the Adult Mouse Brain.
Embedded: yes

Text

Analysis of Drop-seq data was performed using two iterative rounds of independent component analysis (ICA) on each of the nine tissue regions separately (first round, “Global clustering”; second round, “Subclustering”). Function definitions and parameter settings of all operations performed are provided. In the first stage, digital gene expression matrices were column-normalized. Cells with fewer than 400 expressed genes were removed from analysis. To identify a set of highly variable genes, we first calculated the average mean and variance of each gene, and selected genes that were: (1) 0.1 log10 units above the expected variance for a perfectly Poisson-distributed gene of equivalent mean expression; and (2) above a Bonferroni-corrected 99% confidence interval defined by a normal approximation of a Poisson distribution. These selected genes were then centered and scaled across all cells, and ICA was performed with 60 components (except for cerebellum, where only 30 components were used), using the fastICA package in R. Clustering of these components was performed by a very similar process to that of the R package Seurat(Gierahn et al., 2017; Satija et al., 2015): a shared