Chunk #17 — Review — Technical challenges to employing reference sets

Source: Context and the human microbiome.
Embedded: yes
Text

of its OTU reference, but nota bene that this represents a distinct concept from that of the reference datasets discussed throughout). The first is a closed-reference approach in which all the sequence data for the input study and the microbiome reference set are compared against a curated 16S rRNA database such as Greengenes [46] to identify which known OTUs are represented. This is computationally tractable even for very large studies since the evaluation of every sequence is independent of every other and since the reference dataset’s OTU assignments can be computed just once (and in advance). The second strategy, known as de novo picking, defines novel OTUs based on the sequences in a study. This is computationally expensive, as all the data must be maintained in memory in order to determine the clusters, and the process is very complex to parallelize. The third approach, open-reference picking, is a hybrid method in which sequences are first compared to a database of known OTUs as described above, after which those that fail to match to a known OTU are then put through a de novo step.Table 1A comparison of OTU-picking strategiesStrategyProsConsData combination biasClosed-reference• Is extremely parallelizable• Is limited to finding diversity present