Exploring the hydrate landscape using data mining on the Cambridge structural database (CSD)

Int J Pharm. 2024 Dec 17:671:125075. doi: 10.1016/j.ijpharm.2024.125075. Online ahead of print.

Abstract

With the continued relevance of drug hydrates in pharmaceutical sciences, a comprehensive understanding of hydrate and anhydrate forms is essential, not only through individual case studies but also from a broader, systematic perspective. The Cambridge Structural Database (CSD) is a well-established database for crystal structures of organic molecules and here, the structural features of pharmaceutically relevant compounds forming hydrates were explored. Drug anhydrate and hydrate subsets were generated and further classified into separate anhydrate and hydrate sets for free drug, cocrystal/solvate, salt, multicomponent cocrystal/solvate, and salt cocrystal/solvate systems. A thorough understanding of these sets was documented at molecular and structural levels. The CSD drug subset contains 24% of entries as hydrates and 76% as anhydrates. Only 6% of anhydrates have corresponding hydrate forms in the CSD drug subset. The formation of hydrates seems to be still less documented in multicomponent drug hydrates, as well as polymorphism of hydrates is less explored for these increasingly complicated systems with a high number of components. The presence of water molecules or additional components does not necessarily lead to a higher degree of crystal packing. Water is involved in 44% of hydrogen bonds (H-bond) in drug hydrate set, where water prefers to act as H-bond donor. H-bonds formed only by water show a relatively high bond strength. This work demonstrates the potential of data science in analyzing pharmaceutically relevant databases to uncover hidden patterns, and more specifically utilizing the CSD for understanding structural aspects and the role of water in H-bond patterns in drug hydrates.

Keywords: Crystallography; Data science; Database; Drug anhydrate; Drug hydrate.