Hepatotoxicity is a major cause of drug withdrawal from the market. To reduce the drug attrition induced by hepatotoxicity, an accurate and efficient hepatotoxicity prediction system must be constructed. In the present study, we constructed a three-level hepatotoxicity prediction system based on different levels of adverse hepatic effects (AHEs) combined with machine learning, using (1) an end point, hepatotoxicity; (2) four hepatotoxicity severity degrees; and (3) specific AHEs. After collecting and curing 15 873 compound-AHE pairs associated with 2017 compounds and 403 AHEs, we constructed 27 models with three end point levels with the random forest algorithm, and obtained accuracies ranging from 67.0 to 78.2% and the area under receiver operating characteristic curves (AUCs) of 0.715-0.875. The 27 models were fully integrated into a tiered hepatotoxicity prediction system. The existence of hepatotoxicity existence, severity degree, and potential AHEs for a given compound could be inferred simultaneously and systematically. Thus, the tiered hepatotoxicity prediction system allows researchers to have significant confidence in confirming compound hepatotoxicity, analyzing hepatotoxicity from multiple perspectives, obtaining warnings for the potential hepatotoxicity severity, and even rapidly selecting the proper in vitro experiments for hepatotoxicity verification. We also applied three external sets (11 drugs or candidates that failed in clinical trials or were withdrawn from the market, the PharmGKB (offsides) database, and an herbal hepatotoxicity data set) to test and validate the prediction ability of our system. Furthermore, the hepatotoxicity prediction system was adapted into a flow framework based on the Konstanz Information Miner, which was made available for researchers.
Keywords: SAR; adverse hepatic effects; hepatotoxicity; random forest; toxicity risk assessment.