Chunk #47 — Review — Chromatin accessibility high-throughput sequence data analysis — Stage 4 analysis

Source: Chromatin accessibility: a window into the genome.
Embedded: yes

Text

Data annotation and integration represents the final and most informative stage of analysis and requires computational and genomics background on genomic organization and structure (Step 16). After identification of enriched regions and estimation of metrics of nucleosome organization and TF occupancy, it is often desirable to evaluate this data in light of relevant information from other experiments. For example, a researcher can evaluate the overlap or association of the sequence data with genomic features (that is promoters, introns, intergenic regions, TSSs, TTSs) and ontological entities (that is molecular functions, biological processes, cellular components, disease ontologies, and so on). For that purpose, BedTools (documentation is available at http://bedtools.readthedocs.org) and its sister PyBEDTools represent a versatile suite of utilities for a variety of comparative and exploratory operations on genomic features such as identifying overlap between two datasets, extracting unique features, and merging enriched regions using a predefined distance value [141, 142, 152]. Also the UCSC genome browser offers a suite of similar utilities specifically tailored for data file conversions (http://genome.ucsc.edu/util.html). Identified chromatin accessible locations can be compared against functional annotations with GREAT, to identify significantly enriched pathways or ontologies and direct future hypotheses [153].