Chunk #26 — Methods — Data. — TF ChIP–seq data.

Source: Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements.
Embedded: yes

Text

On 15 October 2015, we downloaded all available TF chromatin immunoprecipitation followed by sequencing (ChIP–seq) data derived from human primary cells or cell lines deposited on National Center for Biotechnology Information Gene Expression Omnibus (NCBI GEO) (n = 13,732 datasets). We retained accessions for which input ChIP–seq (control data) were also generated and made public (n = 3,181 of 13,732). We downloaded raw sequencing data in SRA format from NCBI GEO, then converted the data to FASTQ format using the SRA Toolkit function fastq-dump, used FastQC for quality assessment of sequencing reads and finally mapped reads to the human genome (hg19/GRCh37) with Bowtie2 (v.2.2.5) using default parameters. All ChIP–seq datasets were matched to corresponding control data from which peaks were called with macs (v.2.1) with q value <0.01 under a bimodal model, producing 3,181 bed file-formatted files32,39. For compatibility with the IMPACT method, we selected TFs with a known sequence motif, as recorded in the MEME database. Of the 442 TFs represented by the 3,181 TF ChIP–seq datasets, only 142 matched a known sequence motif, narrowing down the total number