Chunk #2 — DATA IN CANSAR — Biological data

Source: canSAR: update to the cancer translational research and drug discovery knowledgebase.
Embedded: yes

Text

canSAR contains the entire human proteome (20 375 sequences) from the Uniprot Swiss-Prot (10) database (release 2020_04) as well as >542 000 non-human sequences. canSAR contains a significant increase in all data types, as well as novel data. We have increased the number of molecular profiling studies and now capture multi-omic profiling data on >25 000 cancer patients from large-scale cancer omics initiatives (e.g. TCGA (11), ICGC and Target (12)). Data derive from 94 studies across 26 cancer types, with recent focus being on increasing data on advanced and metastatic, rare and childhood cancers. We perform significant standardisation of the data across studies, curation of the data and annotation with the most appropriate clinical classification systems. For example, although most studies have TNM (TNM Classification of Malignant Tumors) and grade information, we also utilise the Gleason prognostic scores for prostate cancer; FAB, Ann Arbor staging and BINET for blood cancers and FIGO for ovarian cancers etc. The data now include >9 900 000 protein coding mutation data points, >107 million gene-level copy number alterations, >218 million gene expression profiles from