paperKB
coga / coga-kb
Processing
Help
Sign in

Chunk #36 — Materials and methods — Data sources and basic analysis

Source
Analysis of variation at transcription factor binding sites in Drosophila and humans.
Embedded
yes

Text

Motif discovery data were from the modENCODE and ENCODE repositories [23,24,60,61], with the exceptions of Bin, Tin and Twi that were from Zinzen et al. [2]. Drosophila ChIP data were from Zinzen et al., modENCODE and other published sources [2,24-30]; human ChIP data were from ENCODE [23] (see Tables S1 and S2 in Additional file 2 for details). CTCF multi-individual data were from [16,44]. EPO alignments for 12 mammals were from Ensembl [62,63]; phastcons scores [64] and multiz alignments for 12 Drosophila species were from Flybase [65,66]. Drosophila variation data were from the DGRP [22], additionally filtered as described below. Human variation data were from the 1000 Genomes Pilot Project [21]. Motif matches were detected using patser [67] (in case of overlapping matches, only the strongest-scoring motif was included) and overlaps with ChIP regions ('bound' motifs) were called using bedTools [68]. Analysis was performed in R, Python and Perl with Ensembl API.