An exemplar of this issue is the assignment of operational taxonomic units (OTUs). The primary data type used in analysis of a microbiome study is the OTU table [43, 44], a matrix in which the rows represent observations (OTUs), the columns represent samples, and the elements correspond to the number of counts of a given observation within a sample. In order to be comparable, a reference and a study must have their sequence data assigned to a shared set of OTUs (i.e., partitioned into a common set of bins). OTUs themselves are clusters of similar sequences, with the similarity threshold generally set at 97 % by sequence identity, and are typically determined in one of three ways as summarized in Table 1 (for a comprehensive review of OTU picking, please see [45]; each of these methods is named in terms of its OTU reference, but nota bene that this represents a distinct concept from that of the reference datasets discussed throughout). The first is a closed-reference approach in which all the sequence data for the input study and the microbiome