This pipeline results in the ability to group compounds with the same canonical tautomer, hence enabling users to identify bioactivity data relating to their particular chemical of interest, regardless of its original representation. Figure 3A shows the structural family for the anti-hormonal drug tamoxifen. First, the relationship between the dosed ingredient which is a pro-drug, and the active form of the compound which is the major, 4-hydroxy metabolite afimoxifene, is displayed. Then, the parent compound, stripped of stereochemistry, allows grouping of different enantiomers. Importantly, the canonical tautomers enable the user to group together compounds that are chemically equivalent but are represented differently in source databases. Moreover, the grouping system facilitates the identification of related enantiomers and molecules with undefined stereochemistry. While the bioactivity data for each of the chemical structures is not be mixed, the system alerts the user to related compounds that may hold useful information. As a result, we now have >3 million bioactive small molecules in canSAR linked through these hierarchies and through to all related bioactivity data, 3D structures and also clinical trials where available.