BioMedGraphica: An All-in-One Platform for Biomedical Prior Knowledge and Omic Signaling Graph Generation

bioRxiv [Preprint]. 2024 Dec 9:2024.12.05.627020. doi: 10.1101/2024.12.05.627020.

Abstract

Artificial intelligence (AI) is revolutionizing scientific discovery because of its super capability, following the neural scaling laws, to integrate and analyze large-scale datasets to mine knowledge. Foundation models, large language models (LLMs) and large vision models (LVMs), are among the most important foundations paving the way for general AI by pre-training on massive domain-specific datasets. Different from the well annotated, formatted and integrated large textual and image datasets for LLMs and LVMs, biomedical knowledge and datasets are fragmented with data scattered across publications and inconsistent databases that often use diverse nomenclature systems in the field of AI for Precision Health and Medicine (AI4PHM). These discrepancies, spanning different levels of biomedical organization from genes to clinical traits, present major challenges for data integration and alignment. To facilitate foundation AI model development and applications in AI4PHM, herein, we developed BioMedGraphica , an all-in-one platform and unified text-attributed knowledge graph (TAKG), consists of 3,131,788 entities and 56,817,063 relations, which are obtained from 11 distinct entity types and harmonizes 29 relations/edge types using data from 43 biomedical databases. All entities and relations are labeled a unique ID and associated with textual descriptions (textual features). Since covers most of research entities in AI4PHM, BioMedGraphica supports the zero-shot or few-shot knowledge discoveries via new relation prediction on the graph. Via a graphical user interface (GUI), researchers can access the knowledge graph with prior knowledge of target functional annotations, drugs, phenotypes and diseases (drug-protein-disease-phenotype), in the graph AI ready format. It also supports the generation of knowledge-multi-omic signaling graphs to facilitate the development and applications of novel AI models, like LLMs, graph AI, for AI4PHM science discovery, like discovering novel disease pathogenesis, signaling pathways, therapeutic targets, drugs and synergistic cocktails.

Publication types

  • Preprint