paperKB
coga / coga-kb
Help
Sign in

Chunk #49 — Online Methods — Expression data preprocessing — Empirical probe matching

Source
Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression.
Embedded
yes

Text

We then used RNA-seq permuted Z-score matrices as a gold standard reference and calculated, for each gene, the Pearson correlation coefficients with all the other genes, yielding a correlation profile for each gene. We then repeated the same analysis for the Illumina meta-analysis and the two different Affymetrix platforms. Finally, we correlated the correlation profiles from each array platform with the correlation profiles from RNA-seq. If there were multiple probes detecting the expression of one gene, we selected the probe showing the highest Pearson correlation with the corresponding gene in the RNA-seq data and treated those as matching expression features in the combined meta-analyses. This yielded 19,942 genes that were detected in RNA-seq datasets and tested in the combined meta-analyses. Genes and probes were matched to Ensembl v7160 (ftp://ftp.ensembl.org/pub/release-71/gtf/homo_sapiens/Homo_sapiens.GRCh37.71.gtf.gz) stable gene IDs and HGNC symbols in all the analyses.