Drug research has increasingly become a field of data science. High-throughput technologies such as robotic drug screening assays, in silico screening/docking, automated compound library generation, high throughput genotyping and various combinations of multi-omics studies are generating huge amounts of drug-related or drug–target data. The amount of drug data being generated is leading to greater use of artificial intelligence (AI) and machine learning (ML), both of which use ‘big data’ to identify novel patterns and trends. Data-driven AI has already made significant contributions to early-stage drug discovery ranging from target identification to lead identification/optimization to drug repurposing to chemical synthesis optimization, and even to the prediction of important drug properties such as efficacy, toxicity and drug–drug interactions (1–3). Data science is also impacting other areas of pharmaceutical practice. Clinicians, pharmacists and other healthcare professionals (HCPs) require accurate, up-to-date drug data to make informed decisions. Often, the sheer volume of drug-gender, drug-response, drug-genotype, drug–drug and drug–food interaction data is overwhelming, and the increasing use of automated clinical decision support (CDS) platforms is reflecting the data challenges faced by HCPs (4). As a