Chunk #81 — Materials and methods — Complete Genomics data

Source: Characterizing and measuring bias in sequence data.
Embedded: yes

Text

All statistics were computed on BAM files provided by Complete Genomics. Complete Genomics' pipeline [49] first maps all reads that can be aligned to the reference with very few errors and then uses local assembly, constrained by read-pairing information, to accumulate evidence of variation from the remaining reads. Unlike the standard Complete Genomics BAM representations, these BAM files represent both the aligned and locally assembled reads, containing a single record for every read representing its highest-scoring alignment to the reference, using padded alignment to represent the relationships produced by the local assembler (personal communication, Srinka Ghosh, Complete Genomics). In cases where multiple equally good alignments/assemblies existed for a particular read pair, the file contains one chosen at random, similar to the policies of the aligners used on the other technologies. For the purpose of measuring coverage, this representation is superior to the BAMs produced by Complete Genomics' publicly available tools because it unifies the alignment and assembly data and presents a single 'best' alignment/assembly for each read pair.