SEQMINER: An R-Package to Facilitate the Functional Interpretation of Sequence-Based Associations.
- Authors
- Zhan, Xiaowei; Liu, Dajiang J
- Year
- 2015
- Journal
- Genetic epidemiology
- PMID
- 26394715
- DOI
- 10.1002/gepi.21918
- PMCID
- PMC4794281
Next-generation sequencing has enabled the study of a comprehensive catalogue of genetic variants for their impact on various complex diseases. Numerous consortia studies of complex traits have publically released their summary association statistics, which have become an invaluable resource for learning the underlying biology, understanding the genetic architecture, and guiding clinical translations. There is great interest in the field in developing novel statistical methods for analyzing and interpreting results from these genotype-phenotype association studies. One popular platform for method development and data analysis is R. In order to enable these analyses in R, it is necessary to develop packages that can efficiently query files of summary association statistics, explore the linkage disequilibrium structure between variants, and integrate various bioinformatics databases. The complexity and scale of sequence datasets and databases pose significant computational challenges for method developers. To address these challenges and facilitate method development, we developed the R package SEQMINER for annotating and querying files of sequence variants (e.g., VCF/BCF files) and summary association statistics (e.g., METAL/RAREMETAL files), and for integrating bioinformatics databases. SEQMINER provides an infrastructure where novel methods can be distributed and applied to analyzing sequence datasets in practice. We illustrate the performance of SEQMINER using datasets from the 1000 Genomes Project. We show that SEQMINER is highly efficient and easy to use. It will greatly accelerate the process of applying statistical innovations to analyze and interpret sequence-based associations. The R package, its source code and documentations are available from http://cran.r-project.org/web/packages/seqminer and http://seqminer.genomic.codes/.
No figures extracted from this document.
| Name | Type |
|---|---|
| 1000 Genomes phase 1 project local | cohort |
| 1000 Genomes Project | cohort |
| alcohol abuse | phenotype |
| anthropometric traits | phenotype |
| body mass index | phenotype |
| C++ local | drug |
| CFH | gene |
| complex diseases | phenotype |
| data.table::fread local | drug |
| DNA sequence variant local | variant |
| Drinking addiction local | phenotype |
| functional variant | variant |
| gene | gene |
| height | phenotype |
| lipid levels | phenotype |
| metal | drug |
| missense variants | variant |
| R local | drug |
| RAREMETAL local | drug |
| read.table local | drug |
| rvmeta.readDatabyRange local | drug |
| SEQMINER local | drug |
| sequence variant local | variant |
| study cohort | cohort |
| synonymous variant | variant |
| tabix local | drug |
| tabix.read.table local | drug |
| tobacco dependence | phenotype |
| trait | phenotype |
| whole chromosome variants local | variant |
No uploaded files.
| Citation | PMID | DOI | Status |
|---|---|---|---|
| Burgess S , et al. 2014 Using multivariable Mendelian randomization to disentangle the causal effects of lipid fractions. PLoS ONE 9(10):e108891.2530249610.1371/journal.pone.0108891PMC4193746 | — | — | — |
| Consortium GP , et al. 2012 An integrated map of genetic variation from 1,092 human genomes. Nature 491(7422):56–65.2312822610.1038/nature11632PMC3498066 | — | — | — |
| Danecek P , et al. 2011 The variant call format and VCFtools. Bioinformatics 27(15):2156–2158.2165352210.1093/bioinformatics/btr330PMC3137218 | — | — | — |
| Do R , et al. 2013 Common variants associated with plasma triglycerides and risk for coronary artery disease. Nat Genet 45(11):1345–1352.2409706410.1038/ng.2795PMC3904346 | — | — | — |
| Feng S , et al. 2014 RAREMETAL: fast and powerful meta‐analysis for rare variants. Bioinformatics.10.1093/bioinformatics/btu367PMC417301124894501 | — | — | — |
| Giambartolomei C , et al. 2014 Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet 10(5):e1004383.2483039410.1371/journal.pgen.1004383PMC4022491 | — | — | — |
| Hu YJ , et al. 2013 Meta‐analysis of gene‐level associations for rare variants based on single‐variant statistics. Am J Hum Genet 93(2):236–248.2389147010.1016/j.ajhg.2013.06.011PMC3738834 | — | — | — |
| Kichaev G , et al. 2014 Integrating functional data to prioritize causal variants in statistical fine‐mapping studies. PLoS Genet 10(10):e1004722.2535720410.1371/journal.pgen.1004722PMC4214605 | — | — | — |
| Lee S , et al. 2013 General framework for meta‐analysis of rare variants in sequencing association studies. Am J Hum Genet 93(1):42–53.2376851510.1016/j.ajhg.2013.05.010PMC3710762 | — | — | — |
| Li H . 2011 Tabix: fast retrieval of sequence features from generic TAB‐delimited files. Bioinformatics 27(5):718–719.2120898210.1093/bioinformatics/btq671PMC3042176 | — | — | — |
| Liu DJ , et al. 2014 Meta‐analysis of gene‐level tests for rare variant association. Nat Genet 46(2):200–204.2433617010.1038/ng.2852PMC3939031 | — | — | — |
| Price AL , et al. 2010 Pooled association tests for rare variants in exon‐resequencing studies. Am J Hum Genet 86(6):832–838.2047100210.1016/j.ajhg.2010.04.005PMC3032073 | — | — | — |
| Tang ZZ , Lin DY . 2013 MASS: meta‐analysis of score statistics for sequencing studies. Bioinformatics 29(14):1803–1805.2369886110.1093/bioinformatics/btt280PMC3702254 | — | — | — |
| Tang ZZ , Lin DY . 2014 Meta‐analysis of sequencing studies with heterogeneous genetic associations. Genet Epidemiol 38(5):389–401.2479918310.1002/gepi.21798PMC4157393 | — | — | — |
| Voight BF , et al. 2012 Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study. Lancet 380(9841):572–580.2260782510.1016/S0140-6736(12)60312-2PMC3419820 | — | — | — |
| Wen X , Stephens M . 2014 Bayesian methods for genetic association analysis with heterogeneous subgroups: from meta‐analyses to gene‐environment interactions. 176–203.10.1214/13-AOAS695PMC458315526413181 | — | — | — |
| Willer CJ , Li Y , Abecasis GR . 2010 METAL: fast and efficient meta‐analysis of genomewide association scans. Bioinformatics 26(17):2190–2191.2061638210.1093/bioinformatics/btq340PMC2922887 | — | — | — |
In this knowledge base
| Title | Year | PMID |
|---|---|---|
| Meta-analysis of up to 622,409 individuals identifies 40 novel smoking behaviour associated genetic loci. | 2020 | 30617275 |
External
| Title | Authors | Journal | Year | Link |
|---|---|---|---|---|
| Cockayne syndrome B protein is implicated in transcription and associated chromatin dynamics in homeostatic and genotoxic conditions. | Liakos A et al. | — | 2025 | → |
| A combinatorial approach to uncover an additional Integrator subunit. | Offley SR et al. | — | 2023 | → |
| eQTL Catalogue 2023: New datasets, X chromosome QTLs, and improved detection and visualisation of transcript-level QTLs. | Kerimov N et al. | — | 2023 | → |
| Phosphorylated histone variant γH2Av is associated with chromatin insulators in Drosophila. | Simmons JR et al. | — | 2022 | → |
| A compendium of uniformly processed human gene expression and splicing quantitative trait loci. | Kerimov N et al. | — | 2021 | → |
| Androgen and glucocorticoid receptor direct distinct transcriptional programs by receptor-specific and shared DNA binding sites. | Kulik M et al. | — | 2021 | → |
| EGR1 is a gatekeeper of inflammatory enhancers in human macrophages. | Trizzino M et al. | — | 2021 | → |
| An adaptive test for meta-analysis of rare variant association studies. | Yang T et al. | — | 2020 | → |
| Meta-analysis of up to 622,409 individuals identifies 40 novel smoking behaviour associated genetic loci. | Erzurumluoglu AM et al. | — | 2020 | → |
| Rapid and Scalable Profiling of Nascent RNA with fastGRO. | Barbieri E et al. | — | 2020 | → |
| Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use. | Liu M et al. | — | 2019 | → |
| Computational Analysis of HLA-presentation of Non-synonymous Recipient Mismatches Indicates Effect on the Risk of Chronic Graft-vs.-Host Disease After Allogeneic HSCT. | Ritari J et al. | — | 2019 | → |
| Early chromatin shaping predetermines multipotent vagal neural crest into neural, neuronal and mesenchymal lineages. | Ling ITC et al. | — | 2019 | → |
| Exome Chip Meta-analysis Fine Maps Causal Variants and Elucidates the Genetic Architecture of Rare Coding Variants in Smoking and Alcohol Use. | Brazel DM et al. | — | 2019 | → |
| Making Sense of the Epigenome Using Data Integration Approaches. | Cazaly E et al. | — | 2019 | → |
| Reconstruction of the Global Neural Crest Gene Regulatory Network In Vivo. | Williams RM et al. | — | 2019 | → |
| The Mega2R package: R tools for accessing and processing genetic data in common formats | Baron RV et al. | — | 2019 | — |
| Hidden genomic MHC disparity between HLA-matched sibling pairs in hematopoietic stem cell transplantation. | Koskela S et al. | — | 2018 | → |
| The Mega2R package: R tools for accessing and processing genetic data in common formats. | Baron RV et al. | — | 2018 | → |
| The Tumor Suppressor ARID1A Controls Global Transcription via Pausing of RNA Polymerase II. | Trizzino M et al. | — | 2018 | → |
| Exome-wide association study of plasma lipids in >300,000 individuals. | Liu DJ et al. | — | 2017 | → |
| Illustrating, Quantifying, and Correcting for Bias in Post-hoc Analysis of Gene-Based Rare Variant Tests of Association. | Grinde KE et al. | — | 2017 | → |
| FREGAT: an R package for region-based association analysis. | Belonogova NM et al. | — | 2016 | → |
| RVTESTS: an efficient and comprehensive tool for rare variant association analysis using sequence data. | Zhan X et al. | — | 2016 | → |