Neurosymbolic, Multiagent AI Speeds Oncology Clinical Trial Matching by Fourfold
A neurosymbolic, multi-agent AI system developed by Massive Bio improved the accuracy, throughput, and timeliness of oncology clinical trial matching while maintaining clinician oversight and modest subgroup performance gaps, according to a prospective study published in ESMO Real World Data and Digital Oncology.
The platform—which couples large language model (LLM)-based extraction with ontology-grounded, deterministic eligibility reasoning—achieved four-times-faster clinical trial matching compared with manual review, and appeared to outperform zero-shot GPT-4, chain-of-thought GPT-4, and frontier GPT-4o baselines in matching tasks. No demographic or disease subgroup exceeded a 10-point performance gap.
“This publication draws a line,” said lead author Arturo Loaiza-Bonilla, MD, MSEd, FACP, Co-Founder and Chief Medical AI Officer at Massive Bio and Systemwide Chief of Hematology and Oncology at St. Luke's University Health Network, Easton, Pennsylvania, in a Massive Bio press release. “We are no longer debating whether AI can work in oncology clinical trial matching. We are demonstrating how it works, at scale, in routine practice, with transparent and auditable results.”
He added, “The architecture matters as much as the outcomes: neurosymbolic, multiagent systems grounded in domain-specific knowledge graphs are the infrastructure layer oncology has been missing. This is how we begin to close the gap where only 3% to 5% of [patients with cancer] access clinical trials, not because trials do not exist, but because we have failed to operationalize matching at the speed and complexity the disease demands.”
Study and Model Methods
Consecutive patients (n = 3,804) were screened over a 12-month period. All participants had an Eastern Cooperative Oncology Group performance status between 0 and 2 and were balanced in the study for cancer type incidence and presenting with metastatic or progressive malignancies. The primary analytical unit was the patient–trial pair.
The investigators developed a multiagent architecture comprising domain-tuned LLM-based extraction and reasoning agents, a curated oncology knowledge graph, a prioritization engine, and an expert-curated corpus. Together, these components enabled automated data extraction, harmonization, and trial matching across 157,367 clinical pages (approximately 86.5 million tokens).
For evaluation, two oncologists produced a gold standard of trial eligibility labels using a predefined interpretation protocol (Cohen’s κ = 0.92). System performance was compared against multiple baselines, including manual screening, GPT-4 zero-shot prompting, GPT-4 chain-of-thought prompting, and frontier GPT-4o extraction and matching benchmarks. The evaluated outcomes included sensitivity, specificity, precision, F1 score, calibration of eligibility confidence scores, time-to-recommendation, fairness across demographic subgroups, and operational burden.
Commenting on the approach, Çağatay M. Çulcuoğlu, Co-Founder, Chief Technology Officer, and Chief Operating Officer at Massive Bio, stated in the press release, “Building AI that performs well in a controlled setting is a solved problem. Building AI that performs reliably across thousands of patients with fragmented, incomplete, and heterogeneous clinical data, that is the engineering challenge this paper addresses. Our three-agent architecture was designed from the ground up to handle the scale, complexity, and safety requirements of real-world oncology. The knowledge graph is not an add-on; it is the backbone that makes the system auditable, deterministic where it must be, and resilient to the noise that defines real clinical data.”
Key Findings
The AI-driven clinical trial matching system achieved an F1 score of 0.82 (95% confidence interval = 0.81–0.83), compared with 0.47 for the GPT-4 zero-shot baseline and 0.67 for the GPT-4 chain-of-thought baseline. Further, the multi-agent AI system achieved a balanced sensitivity of 0.8375, a specificity of 0.8359, and precision of 0.8121.
Median per-patient screening time decreased from 120 minutes with manual review to approximately 30 minutes total with the AI system, comprising 15 minutes of automated processing and 15 minutes of clinical review.
The system processed 157,000 pages across the cohort, screened 23,912 candidate patient–trial pairs, and produced 17,912 matches confirmed by oncologists, achieving a median time-to-recommendation of fewer than 7 days.
No demographic subgroup demonstrated an F1 gap greater than 10 percentage points; the largest disparity observed, about 7 points, occurred between White and Black patients. The researchers suggested that algorithmic fairness in subgroup gaps may be linked to data equity.
According to the investigators, ablation experiments conducted on the held-out test set indicated that both knowledge graph grounding and multiagent decomposition “contributed materially” to performance and efficiency. Eligibility confidence scores were reported to demonstrate reasonable calibration in the clinically relevant operating range.
“What differentiates this work is its prospective design and the rigor of the validation framework. We evaluated the system against real oncologist decisions, not curated benchmarks. The result, an F1 of 0.82 across more than 17,000 confirmed matches, reflects what the platform delivers when embedded in actual clinical operations,” study investigator Selin Kurnaz, PhD, Co-Founder and Chief Executive Officer at Massive Bio, concluded in a news statement. “Equally important, we designed the evaluation to surface equity gaps before they become entrenched. AI in oncology must be held to the same evidentiary standard as the therapies it helps deliver.”
DISCLOSURES: No funding was declared. For full disclosures of the study authors, visit esmorwd.org.
ASCO AI in Oncology is published by Conexiant under a license arrangement with the American Society of Clinical Oncology, Inc. (ASCO®). The ideas and opinions expressed in ASCO AI in Oncology do not necessarily reflect those of Conexiant or ASCO. For more information, see Policies.