We hypothesized that it might be possible to capture at low cost any cellular state by measuring a reduced representation of the transcriptome. To explore this, we analyzed 12,031 Affymetrix HGU133A expression profiles in the Gene Expression Omnibus (GEO). We used these to identify the optimal number of informative transcripts, which we term ‘landmark’ transcripts, k. If k was too small, too much information might be lost, whereas if k was too large, sufficient cost reduction compared to the entire transcriptome might be not be achieved. This analysis showed that 1,000 landmarks were sufficient to recover 82% of the information in the full transcriptome (see STAR Methods). The selection of the 1,000 landmarks was done using a data-driven approach rather than selecting transcripts based on prior biological knowledge, as detailed in STAR Methods.