Chunk #51 — Online Methods — 1. Data matrix, primary analysis and processing, quality control — 1.2 ChIP-seq and DNase-seq uniform reprocessing for consolidated epigenomes — b. Mappability filtering, pooling and subsampling
To avoid artificial differences in signal strength due to differences in sequencing depth, all consolidated histone mark datasets (except the additional histone marks the 7 deeply profiled epigenomes, Fig. 2j) were uniformly subsampled to a maximum depth of 30 million reads (the median read depth over all consolidated samples). For the 7 deeply-profiled reference epigenomes (Fig. 2j), histone mark datasets were subsampled to a maximum of 45 million reads (median depth). The consolidated DNase-seq datasets were subsampled to a maximum depth of 50 million reads (median depth). These uniformly subsampled datasets were then used for all further processing steps (peak calling, signal coverage tracks, chromatin states).