Expanding Predictive Capacities in Toxicology: Insights from Hackathon-Enhanced Data and Model Aggregation

Dmitrii O Shkil; Alina A Muhamedzhanova; Philipp I Petrov; Ekaterina V Skorb; Timur A Aliev; Ilya S Steshin; Alexander V Tumanov; Alexander S Kislinskiy; Maxim V Fedorov

doi:10.3390/molecules29081826

Expanding Predictive Capacities in Toxicology: Insights from Hackathon-Enhanced Data and Model Aggregation

Molecules. 2024 Apr 17;29(8):1826. doi: 10.3390/molecules29081826.

Authors

Dmitrii O Shkil^{1

2}, Alina A Muhamedzhanova¹, Philipp I Petrov³, Ekaterina V Skorb⁴, Timur A Aliev⁴, Ilya S Steshin¹, Alexander V Tumanov¹, Alexander S Kislinskiy¹, Maxim V Fedorov⁵

Affiliations

¹ Syntelly LLC, Moscow 121205, Russia.
² Moscow Institute of Physics and Technology, Moscow 141700, Russia.
³ Medtech.Moscow, Moscow 119571, Russia.
⁴ Infochemistry Scientific Center, ITMO University, Saint-Petersburg 191002, Russia.
⁵ Kharkevich Institute for Information Transmission Problems of Russian Academy of Sciences, Moscow 127994, Russia.

Abstract

In the realm of predictive toxicology for small molecules, the applicability domain of QSAR models is often limited by the coverage of the chemical space in the training set. Consequently, classical models fail to provide reliable predictions for wide classes of molecules. However, the emergence of innovative data collection methods such as intensive hackathons have promise to quickly expand the available chemical space for model construction. Combined with algorithmic refinement methods, these tools can address the challenges of toxicity prediction, enhancing both the robustness and applicability of the corresponding models. This study aimed to investigate the roles of gradient boosting and strategic data aggregation in enhancing the predictivity ability of models for the toxicity of small organic molecules. We focused on evaluating the impact of incorporating fragment features and expanding the chemical space, facilitated by a comprehensive dataset procured in an open hackathon. We used gradient boosting techniques, accounting for critical features such as the structural fragments or functional groups often associated with manifestations of toxicity.

Keywords: cheminformatics; deep learning; gradient boosting; hackathon; machine learning; neural networks; toxicity.

MeSH terms

Algorithms*
Humans
Quantitative Structure-Activity Relationship*
Toxicology / methods

Grants and funding

No grant number/Medtech.Moscow