Chunk #60 — Online Methods — 1. Data matrix, primary analysis and processing, quality control — 1.2 ChIP-seq and DNase-seq uniform reprocessing for consolidated epigenomes — e. Quality Control
For the uniformly reprocessed and consolidated ChIP-seq and DNase-seq datasets, strand cross-correlation measures were used to estimate signal-to-noise ratios (https://code.google.com/p/phantompeakqualtools/)37. Datasets for each mark were rank ordered based on the normalized strand cross-correlation coefficient (NSC) and flagged if the scores were significantly below the median value or in the range of NSC values for WCE extract controls. Consolidated datasets with extremely low sequencing depth (< 10M reads) were also flagged. Each standardized epigenome was then manually assigned a subjective quality flag of 1 (high), 0 (medium) or −1 (low), based on the number of flagged datasets it contained. The SPOT, FindPeaks and Poisson quality scores were also recomputed for the consolidated datasets. We observed high correlations of the NSC scores with the SPOT (Pearson correlation of 0.7) and FindPeaks scores (Pearson correlation of 0.65). All QC measures are provided in Table S1 (Sheets QCSummary and AdditionalQCScores).