In some sense, cells reside on a low-dimensional manifold in the high-dimensional gene expression space. However, the high dimensionality and sparseness of this space creates the “curse of dimensionality,” where distance measures essentially stop making sense. A second issue concerns measurement noise, with generally low counts and large numbers of dropouts (false negatives). Both of these issues can be mitigated by (1) selecting a reduced set of informative genes and (2) linearly projecting the data to a transformed space where each coordinate corresponds to many co-regulated genes. The most effective way of selecting informative genes, would be to select them relative to known classes. We therefore developed a staged procedure to learn the manifold.