Motif discovery data were from the modENCODE and ENCODE repositories [23,24,60,61], with the exceptions of Bin, Tin and Twi that were from Zinzen et al. [2]. Drosophila ChIP data were from Zinzen et al., modENCODE and other published sources [2,24-30]; human ChIP data were from ENCODE [23] (see Tables S1 and S2 in Additional file 2 for details). CTCF multi-individual data were from [16,44]. EPO alignments for 12 mammals were from Ensembl [62,63]; phastcons scores [64] and multiz alignments for 12 Drosophila species were from Flybase [65,66]. Drosophila variation data were from the DGRP [22], additionally filtered as described below. Human variation data were from the 1000 Genomes Pilot Project [21]. Motif matches were detected using patser [67] (in case of overlapping matches, only the strongest-scoring motif was included) and overlaps with ChIP regions ('bound' motifs) were called using bedTools [68]. Analysis was performed in R, Python and Perl with Ensembl API.