Chunk #13 — INTRODUCTION — Statistical genetics analysis

Source: Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics.
Embedded: yes

Text

To assign likely causal genes for a given variant, we initially developed a disease-agnostic Variant to Gene (V2G) analysis pipeline which provides a single aggregated score for each variant-gene prediction. This analysis combines four different data types; molecular phenotype quantitative trait loci datasets (eQTLs and pQTLs), chromatin interaction and conformation datasets, in silico functional predictions (using the Variant Effect Predictor or VEP score), and distance from the canonical transcript start site (21). The data harmonisation and aggregation process as well as the weighting applied to each of the datasets are described here: https://genetics-docs.opentargets.org/our-approach/data-pipeline. More recently, we have developed a disease-specific gene prioritisation approach (Locus to Gene score, L2G) to prioritise genes at all trait-associated loci using a machine learning model. For this, we integrate fine-mapping credible set analysis across all 133,441 loci with functional genomics data (including pathogenicity prediction, colocalisation with molecular quantitative trait loci, genomic distance and chromatin interaction data) to generate L2G predictive features. We then train a supervised model using over 400 gold-standard positive GWAS loci for which we are confident of the gene implicated to predict