The Franke lab data set is an aggregation of publicly available microarray gene expression data sets comprising 37,427 samples in human, mouse, and rat17,18. We downloaded the publicly available gene expression data from the DEPICT website (see URLs). The available gene expression values already quantify relative expression for a tissue/cell-type rather than absolute expression for a single sample17,18, and so we used these values in place of our t-statistics. We determined that several pairs of tissues had values that were correlated at r2>0.99, including several that had r2=1. We pruned our data so that no two tissues had r2>0.99. Most of the closely correlated pairs were also biologically closely related so that the interpretation did not depend on which tissue we chose to keep (e.g., plasma and plasma cells, joint and joint capsule). For pairs of tissues where one tissue was more specific than the second, we kept the more specific pair (e.g., nose vs. nasal mucosa, quadriceps muscle vs. skeletal muscle). There were two clusters of highly correlated tissues for which we decided to remove the entire cluster, not