paperKB
coga / coga-kb
Help
Sign in

Chunk #16 — Review — Technical challenges to employing reference sets

Source
Context and the human microbiome.
Embedded
yes

Text

An exemplar of this issue is the assignment of operational taxonomic units (OTUs). The primary data type used in analysis of a microbiome study is the OTU table [43, 44], a matrix in which the rows represent observations (OTUs), the columns represent samples, and the elements correspond to the number of counts of a given observation within a sample. In order to be comparable, a reference and a study must have their sequence data assigned to a shared set of OTUs (i.e., partitioned into a common set of bins). OTUs themselves are clusters of similar sequences, with the similarity threshold generally set at 97 % by sequence identity, and are typically determined in one of three ways as summarized in Table 1 (for a comprehensive review of OTU picking, please see [45]; each of these methods is named in terms of its OTU reference, but nota bene that this represents a distinct concept from that of the reference datasets discussed throughout). The first is a closed-reference approach in which all the sequence data for the input study and the microbiome