Chunk #27 — Review — Chromatin accessibility high-throughput sequence data analysis — Stage 1 analysis

Source: Chromatin accessibility: a window into the genome.
Embedded: yes

Text

Initially raw sequencing reads are demultiplexed (Step 1) based on index information into FASTQ files with CASAVA (Illumina) and aligned (Step 2) to a user-defined reference genome (that is human, mouse, and so on) [105]. A number of aligning software is available, such as Maq, RMAP, Cloudburst, SOAP, SHRiMP, BWA and Bowtie [106]. The last two represent the most popular aligning software packages currently. During the alignment process data is filtered (Step 3) to remove overrepresented areas of the genome due to technical bias. Tag filtering is often performed with SAMtools [107] or Picard tools (http://broadinstitute.github.io/picard). For ATAC-seq data specifically, mapped fragments below 38 bp are removed since that is the minimum spacing of transposition events due to steric hindrance [102]. Also, ATAC-seq reads mapping to the mitochondrial genome are discarded as unrelated to the scope of the experiment. Sequencing performance QC (Step 3) is performed during the alignment process, by estimating specific statistical metrics (that is total number of reads, % of unpaired reads, % of reads aligned 0 times, % reads aligned exactly once, % of reads aligned more than once, and overall alignment rate) for each sequenced sample.