Chunk #44 — Materials and methods — Primary data analysis

Source: Detection of low prevalence somatic mutations in solid tumors with ultra-deep targeted sequencing.
Embedded: yes

Text

Reads were aligned to the hg19 genome using BWA (version 0.5.7) [26] in single end mode (samse option). Using SAMTOOLs [27], the reads containing gapped alignments ([ID] MD tag) were further filtered out. Our approach, relying on error rate estimation and statistical analysis was optimized to identify nucleotide substitutions. The detection of indels from gapped alignment requires a more sophisticated approach, which will be developed in future studies. Reads coming from the forward and reverse strands were kept separate for the analysis, to estimate the sequencing error rate independently for each read direction. In addition, for positions contained on two types of reads (either forward and reverse or from overlapping amplicons) an independent statistical analysis was performed. Each read was assigned to one amplicon based on its start coordinate. For each base of each amplicon, we counted the number of each nucleotide with PHRED quality greater than 20 to generate a 'pileup' table. The files corresponding to the raw reads are publicly available on the NCBI Short Read Archive (SRP009487.1) [28].