paperKB
coga / coga-kb
Help
Sign in

Chunk #32 — Methods — Computational model to discriminate the promoters of protein-coding and lncRNA genes

Source
Promoter analysis reveals globally differential regulation of human long non-coding RNA and protein-coding genes.
Embedded
yes

Text

To identify the regulatory patterns which may facilitate the computational discrimination between the promoters of protein-coding genes and lncRNA genes, we extracted features from several broad categories. These include various frequency-based properties of the promoters such as k-mers, word commonality, skew, palindromes; regulatory elements such as CpG islands, repetitive elements, TFBS found within the promoter regions; epigenetic features such as chromatin states and separate histone modification marks (see Text S1 Methods section). We used an ensemble of decision trees [77] to generate a classification model and estimate its accuracy with 20-fold cross-validation.