ESPERR: learning strong and weak signals in genomic sequence alignments to identify functional elements.
- Authors
- Taylor, James; Tyekucheva, Svitlana; King, David C; Hardison, Ross C; Miller, Webb; Chiaromonte, Francesca
- Year
- 2006
- Journal
- Genome research
- PMID
- 17053093
- DOI
- 10.1101/gr.4537706
- PMCID
- PMC1665643
Genomic sequence signals - such as base composition, presence of particular motifs, or evolutionary constraint - have been used effectively to identify functional elements. However, approaches based only on specific signals known to correlate with function can be quite limiting. When training data are available, application of computational learning algorithms to multispecies alignments has the potential to capture broader and more informative sequence and evolutionary patterns that better characterize a class of elements. However, effective exploitation of patterns in multispecies alignments is impeded by the vast number of possible alignment columns and by a limited understanding of which particular strings of columns may characterize a given class. We have developed a computational method, called ESPERR (evolutionary and sequence pattern extraction through reduced representations), which uses training examples to learn encodings of multispecies alignments into reduced forms tailored for the prediction of chosen classes of functional elements. ESPERR produces a greatly improved Regulatory Potential score, which can discriminate regulatory regions from neutral sites with excellent accuracy ( approximately 94%). This score captures strong signals (GC content and conservation), as well as subtler signals (with small contributions from many different alignment patterns) that characterize the regulatory elements in our training set. ESPERR is also effective for predicting other classes of functional elements, as we show for DNaseI hypersensitive sites and highly conserved regions with developmental enhancer activity. Our software, training data, and genome-wide predictions are available from our Web site (http://www.bx.psu.edu/projects/esperr).
No figures extracted from this document.
No chunks β full text not yet ingested.
No entities extracted from this document yet.
No uploaded files.
No citations found.
In this knowledge base
External
| Title | Authors | Journal | Year | Link |
|---|---|---|---|---|
| Genetic Polymorphisms of the Telomerase Reverse Transcriptase Gene in Relation to Prostate Tumorigenesis, Aggressiveness and Mortality: A Cross-Ancestry Analysis. | Zhan Y et al. | β | 2023 | β |
| Telomere length and hTERT genetic variants as potential prognostic markers in multiple myeloma. | Dratwa M et al. | β | 2023 | β |
| Assessment of Telomerase Reverse Transcriptase Single Nucleotide Polymorphism in Sleep Bruxism. | Macek P et al. | β | 2022 | β |
| TERT rs2736100 and TERC rs16847897 genotypes moderate the association between internalizing mental disorders and accelerated telomere length attrition among HIV+ children and adolescents in Uganda. | Kalungi A et al. | β | 2021 | β |
| An integrative view of the regulatory and transcriptional landscapes in mouse hematopoiesis. | Xiang G et al. | β | 2020 | β |
| Association between Telomere-Related Polymorphisms and the Risk of IPF and COPD as a Precursor Lesion of Lung Cancer: Findings from the Fukuoka Tobacco-Related Lung Disease (FOLD) Registry. | Arimura-Omori M et al. | β | 2020 | β |
| In memory of James Taylor: the birth of Galaxy. | Nekrutenko A et al. | β | 2020 | β |
| An integrative view of the regulatory and transcriptional landscapes in mouse hematopoiesis | Xiang G et al. | β | 2019 | β |
| A <i>Gata3</i> 3' Distal Otic Vesicle Enhancer Directs Inner Ear-Specific <i>Gata3</i> Expression. | Moriguchi T et al. | β | 2018 | β |
| Deletions at SLC18A1 increased the risk of CRC and lower SLC18A1 expression associated with poor CRC outcome. | Zhang D et al. | β | 2017 | β |
| Genetic insights into juvenile idiopathic arthritis derived from deep whole genome sequencing. | Wong L et al. | β | 2017 | β |
| miR-26b promoter analysis reveals regulatory mechanisms by lipid-related transcription factors in goat mammary epithelial cells. | Wang H et al. | β | 2017 | β |
| An Empirical Prior Improves Accuracy for Bayesian Estimation of Transcription Factor Binding Site Frequencies within Gene Promoters. | Ramsey SA | β | 2015 | β |
| Association of genetic polymorphisms in the telomerase reverse transcriptase gene with prostate cancer aggressiveness. | Wu D et al. | β | 2015 | β |
| Combined Whole Methylome and Genomewide Association Study Implicates CNTN4 in Alcohol Use. | Clark SL et al. | β | 2015 | β |
| DNase I hypersensitivity analysis of the mouse brain and retina identifies region-specific regulatory elements. | Wilken MS et al. | β | 2015 | β |
| Multiple Changes of Gene Expression and Function Reveal Genomic and Phenotypic Complexity in SLE-like Disease. | Wilbe M et al. | β | 2015 | β |
| Occupancy by key transcription factors is a more accurate predictor of enhancer activity than histone modifications or chromatin accessibility. | Dogan N et al. | β | 2015 | β |
| Genetic variants in telomerase reverse transcriptase (TERT) and telomerase-associated protein 1 (TEP1) and the risk of male infertility. | Yan L et al. | β | 2014 | β |
| Genetic variation in MKL2 and decreased downstream PCTAIRE1 expression in extreme, fatal primary human microcephaly. | Ramos EI et al. | β | 2014 | β |
| Severe osteoarthritis of the hand associates with common variants within the ALDH1A2 gene and with rare variants at 1p31. | Styrkarsdottir U et al. | β | 2014 | β |
| White matter abnormalities in 22q11.2 deletion syndrome: preliminary associations with the Nogo-66 receptor gene and symptoms of psychosis. | Perlstein MD et al. | β | 2014 | β |
| Genetic variants of MARCO are associated with susceptibility to pulmonary tuberculosis in a Gambian population. | Bowdish DM et al. | β | 2013 | β |
| Multiple independent variants at the TERT locus are associated with telomere length and risks of breast and ovarian cancer. | Bojesen SE et al. | β | 2013 | β |
| Neurotransmitter systems and neurotrophic factors in autism: association study of 37 genes suggests involvement of DDC. | Toma C et al. | β | 2013 | β |
| Segmenting the human genome based on states of neutral genetic divergence. | Kuruppumullage Don P et al. | β | 2013 | β |
| TERT genetic polymorphism rs2736100 was associated with lung cancer: a meta-analysis based on 14,492 subjects. | Wang HM et al. | β | 2013 | β |
| TERT polymorphisms modify the risk of acute lymphoblastic leukemia in Chinese children. | Sheng X et al. | β | 2013 | β |
| CD6 and syntaxin binding protein 6 variants and response to tumor necrosis factor alpha inhibitors in Danish patients with rheumatoid arthritis. | Krintel SB et al. | β | 2012 | β |
| Characterization of enhancer function from genome-wide analyses. | Maston GA et al. | β | 2012 | β |
| Conserved Motifs and Prediction of Regulatory Modules in Caenorhabditis elegans. | Zhao G et al. | β | 2012 | β |
| Copy number variation at 6q13 functions as a long-range regulator and is associated with pancreatic cancer risk. | Huang L et al. | β | 2012 | β |
| Functional variants in NFKBIE and RTKN2 involved in activation of the NF-ΞΊB pathway are associated with rheumatoid arthritis in Japanese. | Myouzen K et al. | β | 2012 | β |
| Genomic approaches towards finding cis-regulatory modules in animals. | Hardison RC et al. | β | 2012 | β |
| Identification and characterization of Hoxa9 binding sites in hematopoietic cells. | Huang Y et al. | β | 2012 | β |
| LIM domain only 2 protein expression, LMO2 germline genetic variation, and overall survival in diffuse large B-cell lymphoma in the pre-rituximab era. | Cerhan JR et al. | β | 2012 | β |
| PRKCB is associated with calcineurin inhibitor-induced renal dysfunction in heart transplant recipients. | Lachance K et al. | β | 2012 | β |
| Some phenotype association tools in Galaxy: looking for disease SNPs in a full genome. | Giardine BM et al. | β | 2012 | β |
| Telomerase reverse transcriptase locus polymorphisms and cancer risk: a field synopsis and meta-analysis. | Mocellin S et al. | β | 2012 | β |
| A 3-bp deletion in the HBS1L-MYB intergenic region on chromosome 6q23 is associated with HbF expression. | Farrell JJ et al. | β | 2011 | β |
| A genome-wide view of mutation rate co-variation using multivariate analyses. | Ananda G et al. | β | 2011 | β |
| An NK and T cell enhancer lies 280 kilobase pairs 3' to the gata3 structural gene. | Hosoya-Ohmura S et al. | β | 2011 | β |
| Computational identification of transcriptional regulators in human endotoxemia. | Nguyen TT et al. | β | 2011 | β |
| DNA methylation patterns in luminal breast cancers differ from non-luminal subtypes and can identify relapse risk independent of other clinical variables. | Kamalakaran S et al. | β | 2011 | β |
| Genetic effects at pleiotropic loci are context-dependent with consequences for the maintenance of genetic variation in populations. | Lawson HA et al. | β | 2011 | β |
| Genome-wide association study of recurrent early-onset major depressive disorder. | Shi J et al. | β | 2011 | β |
| Genome-wide association study of theta band event-related oscillations identifies serotonin receptor gene HTR7 influencing risk of alcohol dependence. | Zlojutro M et al. | β | 2011 | β |
| Haplotype block structure of the genomic region of the mu opioid receptor gene. | Levran O et al. | β | 2011 | β |
| Long range regulation of human FXN gene expression. | Puspasari N et al. | β | 2011 | β |
| Novel loci for major depression identified by genome-wide association study of Sequenced Treatment Alternatives to Relieve Depression and meta-analysis of three studies. | Shyn SI et al. | β | 2011 | β |
| When needles look like hay: how to find tissue-specific enhancers in model organism genomes. | Haeussler M et al. | β | 2011 | β |
| Allelic variation at the 8q23.3 colorectal cancer risk locus functions as a cis-acting regulator of EIF3H. | Pittman AM et al. | β | 2010 | β |
| An integrated expression phenotype mapping approach defines common variants in LEP, ALOX15 and CAPNS1 associated with induction of IL-6. | Fairfax BP et al. | β | 2010 | β |
| ChIP-chip analysis of neurexins and other candidate genes for addiction and neuropsychiatric disorders. | Pedrosa E et al. | β | 2010 | β |
| Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. | Schmidt D et al. | β | 2010 | β |
| Genetic architecture of ambulatory blood pressure in the general population: insights from cardiovascular gene-centric array. | Tomaszewski M et al. | β | 2010 | β |
| Genetic risk factors for post-infectious irritable bowel syndrome following a waterborne outbreak of gastroenteritis. | Villani AC et al. | β | 2010 | β |
| Hypoxia regulates BMP4 expression in the murine spleen during the recovery from acute anemia. | Wu DC et al. | β | 2010 | β |
| Lost in the space of bioinformatic tools: a constantly updated survival guide for genetic epidemiology. The GenEpi Toolbox. | Coassin S et al. | β | 2010 | β |
| A common variant associated with dyslexia reduces expression of the KIAA0319 gene. | Dennis MY et al. | β | 2009 | β |
| A genome-wide association study of lung cancer identifies a region of chromosome 5p15 associated with risk for adenocarcinoma. | Landi MT et al. | β | 2009 | β |
| Bidirectional translational research: Progress in understanding addictive diseases. | Kreek MJ et al. | β | 2009 | β |
| Common variants in the NLRP3 region contribute to Crohn's disease susceptibility. | Villani AC et al. | β | 2009 | β |
| Complexity reduction in context-dependent DNA substitution models. | Majoros WH et al. | β | 2009 | β |
| Conserved domains of the class A scavenger receptors: evolution and function. | Bowdish DM et al. | β | 2009 | β |
| Expression of the leukemia oncogene Lmo2 is controlled by an array of tissue-specific elements dispersed over 100 kb and bound by Tal1/Lmo2, Ets, and Gata factors. | Landry JR et al. | β | 2009 | β |
| Features of mammalian microRNA promoters emerge from polymerase II chromatin immunoprecipitation data. | Corcoran DL et al. | β | 2009 | β |
| Genome-wide colonization of gene regulatory elements by G4 DNA motifs. | Du Z et al. | β | 2009 | β |
| Graded repression of PU.1/Sfpi1 gene transcription by GATA factors regulates hematopoietic cell fate. | Chou ST et al. | β | 2009 | β |
| Integrating sequence, evolution and functional genomics in regulatory genomics. | Vingron M et al. | β | 2009 | β |
| It takes (LMO) 2 to tango. | Hardison RC | β | 2009 | β |
| A GATA-1-regulated microRNA locus essential for erythropoiesis. | Dore LC et al. | β | 2008 | β |
| An optimized procedure for the design and evaluation of Ecotilling assays. | Coassin S et al. | β | 2008 | β |
| A single SNP in an evolutionary conserved region within intron 86 of the HERC2 gene determines human blue-brown eye color. | Sturm RA et al. | β | 2008 | β |
| Chromatin profiling across the human tumour necrosis factor gene locus reveals a complex, cell type-specific landscape with novel regulatory elements. | Taylor JM et al. | β | 2008 | β |
| Comparative analyses of bidirectional promoters in vertebrates. | Yang MQ et al. | β | 2008 | β |
| Disruption of neurexin 1 associated with autism spectrum disorder. | Kim HG et al. | β | 2008 | β |
| Human-macaque comparisons illuminate variation in neutral substitution rates. | Tyekucheva S et al. | β | 2008 | β |
| Identification of conserved regulatory elements in mammalian promoter regions: a case study using the PCK1 promoter. | Liu GE et al. | β | 2008 | β |
| Metrics of sequence constraint overlook regulatory sequences in an exhaustive analysis at phox2b. | McGaughey DM et al. | β | 2008 | β |
| Probabilistic inference of transcription factor binding from multiple data sources. | LΓ€hdesmΓ€ki H et al. | β | 2008 | β |
| Sequence variants in the PLEKHH2 region are associated with diabetic nephropathy in the GoKinD study population. | Greene CN et al. | β | 2008 | β |
| Transcriptional enhancement by GATA1-occupied DNA segments is strongly associated with evolutionary constraint on the binding site motif. | Cheng Y et al. | β | 2008 | β |
| 28-way vertebrate alignment and conservation track in the UCSC Genome Browser. | Miller W et al. | β | 2007 | β |
| Advancing translational research with the Semantic Web. | Ruttenberg A et al. | β | 2007 | β |
| Computation and analysis of genomic multi-sequence alignments. | Blanchette M | β | 2007 | β |
| Finding cis-regulatory elements using comparative genomics: some lessons from ENCODE data. | King DC et al. | β | 2007 | β |
| Prediction of tissue-specific cis-regulatory modules using Bayesian networks and regression trees. | Chen X et al. | β | 2007 | β |
| Reliable prediction of regulator targets using 12 Drosophila genomes. | Kheradpour P et al. | β | 2007 | β |
| Sequences conserved by selection across mouse and human malaria species. | Imamura H et al. | β | 2007 | β |
| Experimental validation of predicted mammalian erythroid cis-regulatory modules. | Wang H et al. | β | 2006 | β |