canSAR 2024-an update to the public drug discovery knowledgebase

Nucleic Acids Res. 2024 Nov 13:gkae1050. doi: 10.1093/nar/gkae1050. Online ahead of print.

Abstract

canSAR (https://cansar.ai) continues to serve as the largest publicly available platform for cancer-focused drug discovery and translational research. It integrates multidisciplinary data from disparate and otherwise siloed public data sources as well as data curated uniquely for canSAR. In addition, canSAR deploys a suite of curation and standardization tools together with AI algorithms to generate new knowledge from these integrated data to inform hypothesis generation. Here we report the latest updates to canSAR. As well as increasing available data, we provide enhancements to our algorithms to improve the offering to the user. Notably, our enhancements include a revised ligandability classifier leveraging Positive Unlabeled Learning that finds twice as many ligandable opportunities across the pocketome, and our revised chemical standardization pipeline and hierarchy better enables the aggregation of structurally related molecular records.