Despite the popularity of the pilot Connectivity Map pilot dataset, its small scale limits its utility. With only 164 drug perturbations in only 3 cancer cell lines, the database lacks the necessary richness of a truly genome-scale resource. Missing is a diversity of chemical perturbations, genetic perturbations as well as a diversity of cell types. Unfortunately, the high cost of commercial gene expression microarrays and even RNA sequencing precludes such a genome-scale Connectivity Map. We therefore describe here a new approach to gene expression profiling based on a reduced representation of the transcriptome. This method, which we call L1000, is high-throughput and low-cost, and is thus well-suited to a large-scale Connectivity Map. We report here the first 1,319,138 L1000 profiles as part of the NIH LINCS initiative.