Chunk #14 — INTRODUCTION — Statistical genetics analysis

Source: Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics.
Embedded: yes

Text

with molecular quantitative trait loci, genomic distance and chromatin interaction data) to generate L2G predictive features. We then train a supervised model using over 400 gold-standard positive GWAS loci for which we are confident of the gene implicated to predict causal genes at each locus (see https://github.com/opentargets/genetics-gold-standards). It is important to note that the existing gold-standard genes are likely to be biased towards those that are near the centre of the GWAS peak and which have clear (nonsynonymous) variant consequences, which will influence the features learned in the L2G model. We intend to continue expanding the repository of gold standard loci to enable building the most accurate model possible for gene prioritisation. More details on the machine learning method are described in available online documentation (https://genetics-docs.opentargets.org/our-approach/pipeline-overview).