Chunk #5 — INTRODUCTION — A universe of data built around targets, diseases and drugs

Source: Open Targets Platform: supporting systematic drug-target identification and prioritisation.
Embedded: yes

Text

The decision-making process in a drug discovery project requires a thorough understanding of as many variables as possible to maximise the clinical trial success. The Open Targets Platform, therefore, aims to provide a comprehensive characterisation of targets, diseases or phenotypes and, more recently, known drugs that can help inform target identification and prioritisation (Figure 1B). To reconstruct these main biomedical concepts, we retrieve information from 26 different data sources (Supplementary Table S1). While most datasets are seamlessly integrated, others require some post-processing. For example, our focus on drug targets implies that all gene products could potentially be targeted, so information from core resources such as Ensembl (8) or Uniprot (9) needs to be integrated to cover both RNAs and proteins. Sometimes, more detailed analysis is required to extract the relevant information or adjust the available data to a clinical setup. To recapitulate all the literature available for each of the entities, for example, we performed named-entity recognition on the available abstracts from Europe PMC (https://link.opentargets.io/). Other recent additions, such as the chemical probes or the target enabling packages require a consistent manual curation effort as data is scattered across different resources (10–12).