paperKB
coga / coga-kb
Help
Sign in

Chunk #61 — Method Details — Evaluating performance of reduced representations of the transcriptome

Source
A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.
Embedded
yes

Text

To simulate performance of measuring a subset of the transcriptome, we asked what number of landmarks (k) would optimally recover the observed connections seen in the pilot Connectivity Map dataset based on Affymetrix arrays (Dataset DSCMAP-AFFX). Specifically, prior work indicated that 25 query signatures yielded robust and expected connections to small molecules in the CMap pilot dataset (Table S1). We therefore used those 25 signatures to query the inferred DSCMAP-AFFX dataset for various values of k, counting how often we recovered the connections observed in the original dataset at a comparable rank based on the Kolmogorov-Smirnov statistic. At values of k ranging from 100-10,000, we generated an imputed version of DSCMAP-AFFX using OLS regression (trained on samples from DSGEO) with the k landmarks as the independent variables, queried it with the benchmark signatures, and assessed the percentage of connections that were recovered.