Chunk #64 — Online Methods — 2. Chromatin state learning — 2.1 ‘Core’ 15-state model

Source: Integrative analysis of 111 reference human epigenomes.
Embedded: yes

Text

A ChromHMM model applicable to all 127 epigenomes was learned by virtually concatenating consolidated data corresponding to the core set of 5 chromatin marks assayed in all epigenomes (H3K4me3, H3K4me1, H3K36me3, H3K27me3, H3K9me3). The model was trained on 60 epigenomes with highest-quality data (Fig. 2k), which provided sufficient coverage of the different lineages and tissue types (Table S1 - Sheet QCSummary). The ChromHMM parameters used were as follows: Reads were shifted in the 5’ to 3’ direction by 100 bp. For each consolidated ChIP-seq dataset, read counts were computed in non-overlapping 200 bp bins across the entire genome. Each bin was discretized into two levels, 1 indicating enrichment and 0 indicating no enrichment. The binarization was performed by comparing ChIP-seq read counts to corresponding whole-cell extract control read counts within each bin and using a Poisson p-value threshold of 1e-4 (the default discretization threshold in ChromHMM). We trained several models with the number of states ranging from 10 states to 25 states. We decided to use a 15-state model (Fig. 4a-f, Extended Data 2b) for all further analyses since it