paperKB
coga / coga-kb
Help
Sign in

Chunk #56 — Methods — Gene expression data processing

Source
Genetic identification of cell types underlying brain complex traits yields insights into the etiology of Parkinson's disease.
Embedded
yes

Text

All datasets were processed uniformly. First we computed the mean expression for each gene in each cell type from the single-cell expression data (if this statistics was not provided by the authors). We used the pre-computed median expression across individuals for the GTEx dataset and excluded tissues that were not sampled in at least 100 individuals, non-natural tissues (e.g. EBV-transformed lymphocytes) and testis (outlier using hierarchical clustering). We then averaged the expression of tissues by organ (with the exception of brain tissues) resulting in gene expression profiles of a total of 37 tissues. For all datasets, we filtered out any genes with non-unique names, genes not expressed in any cell types, non-protein coding genes, and, for mouse datasets, genes that had no expert curated 1:1 orthologs between mouse and human (Mouse Genome Informatics, The Jackson laboratory, version 11/22/2016). Gene expression was then scaled to a total of 1M UMIs (or transcript per million (TPM)) for each cell type/tissue. We then calculated a metric of gene expression specificity by dividing the expression of each gene in each cell type by the