HypoChainer: A Collaborative System Combining LLMs and Knowledge Graphs for Hypothesis-Driven Scientific Discovery
Shaohan Shi -
Haoran Jiang -
Yunjie Yao -
Chang Jiang -
Quan Li -

Download preprint PDF
Download camera-ready PDF
Room: Hall E1
Keywords
Large Language Model, Visual Analytics, Iterative Human-AI Collaboration, Knowledge Graph, Hypothesis Construction
Abstract
Modern scientific discovery encounters significant challenges in integrating the rapidly expanding and heterogeneous body of knowledge required for driving breakthroughs in biomedicine and drug development. While traditional hypothesis-driven research has proven effective, it is constrained by human cognitive limitations, the complexity of biological systems, and the high costs associated with trial-and-error experimentation. Deep learning models, particularly graph neural networks (GNNs), have accelerated scientific progress. However, the sheer volume of predictions they generate makes manual selection for experimental validation impractical. Attempts to leverage large language models (LLMs) for filtering predictions and generating novel hypotheses have been impeded by issues such as hallucinations and the lack of structured knowledge grounding, which undermine their reliability. To address these challenges, we propose HypoChainer, a collaborative visualization framework that integrates human expertise, LLM-driven reasoning, and knowledge graphs (KGs) to enhance scientific discovery and validation visually. HypoChainer operates through three key stages: (1) Exploration and Contextualization: Domain experts employ retrieval-augmented LLMs (RAGs) and dimensionality reduction techniques to extract insights and research entry points from vast GNN predictions, supplemented by interactive explanations for in-depth understanding; (2) Hypothesis Chain Formation: Experts iteratively explore the relationships between KG information relevant to the predictions and semantically linked nodes consistent with the hypothesis, gaining knowledge and insights while refining the hypothesis through suggestions from LLMs and KGs; and (3) Validation Prioritization: Predictions are filtered and prioritized based on the refined hypothesis chains and KG-supported evidence, identifying high-priority candidates for experimental validation. Weak points in the hypothesis chain are further optimized through visual analytics of the retrieval results. We evaluated the effectiveness of HypoChainer in hypothesis construction and scientific discovery through case studies in two distinct domains and expert interviews.