different environments) can lead to statistically significant patterns in the data that are not driven by the underlying biology. As an example, imagine three samples A, B, and C where A is composed of Escherichia coli and both B and C are composed of Escherichia coli coli and Bacillus subtilis. If the reference is only composed of Escherichia coli, then all three samples will appear to be quite similar. However, if the reference includes Bacillus subtilis, then the conclusion drawn is quite different as A would be less similar to B and C.