One of the first studies to combine multiple microbiome datasets (which these researchers are aware of) was the work by Lozupone and Knight [47], which aggregated sequence data from hundreds of studies in order to determine environmental factor(s) that explained the observed differences in microbial community structure. They discovered that data from samples collected in the natural environment across a multitude of gradients (e.g., pH, temperature, atmospheric pressure) separated primarily based on whether the samples originated from saline or non-saline environments—despite the substantial technical differences between studies. Fascinatingly, when these same data were combined with samples collected from vertebrate guts, the primary variation in the data was explained by whether the samples were environmental or host associated [1], implying that an extremely high degree of specialization has occurred in the microbial communities of vertebrate guts (which is particularly interesting given the difference in evolutionary time that environmental microbial communities have had to specialize relative to the time that vertebrates have existed). While this meta-analysis did not employ a reference set of the type discussed here, it has itself become a de facto reference set that has subsequently been employed for comparison with numerous other studies [48–50].