paperKB
coga / coga-kb
Processing
Help
Sign in

Chunk #33 — Methods — Transcription factor binding sites (TFBSs) enrichment

Source
Promoter analysis reveals globally differential regulation of human long non-coding RNA and protein-coding genes.
Embedded
yes

Text

We predicted TFBSs using 426 position weight matrices (PWMs) for 401 human TFs from the HOCOMOCO [33] database (v.8) (http://www.cbrc.kaust.edu.sa/hocomoco/Download.php) in the promoters of both protein-coding and lncRNA genes. Since the extent to which the original nucleotide composition of promoters is a cause or a consequence of the possible TFBS repertoires present in these promoters is unclear, we used the same strategy for both protein-coding and lncRNA promoters. For each PWM the threshold was set in the following way: for a random word generated by a background model (independent nucleotide distribution with nucleotide frequency of hg19) there was a fixed probability of 0.0005 to obtain the PWM score no less than the threshold. We generated 426 features using the binary value 0 or 1 (zero or non-zero hits above the threshold in a given promoter sequence in both strands). We selected significantly overrepresented TFBSs in promoters of protein-coding vs. promoters of lncRNA (and vice versa) gene sets (p-value < = 0.05, right sided Fisher's exact test with Benjamini-Hochberg multiple testing for controlling false discovery rate (FDR) [78]) (See Text S1 Methods section).