Multimodal Deep Learning for First-Line Immunotherapy Response in Gastric Cancer
Multimodal deep learning may help identify which patients with gastric cancer are most likely to benefit from first-line immunotherapy, according to findings presented at the American Association for Cancer Research (AACR) Annual Meeting 2026 (Abstract 6728).
“Most existing studies rely on unimodal data sets,” said presenting author Jingyuan Wang, MD, PhD, of Zhongshan Hospital, Fudan University, Shanghai. “Multimodal frameworks provide promising opportunities for a deeper understanding of tumor biology and for more accurate prediction of treatment outcomes.”
The clinical-radio-pathomic (CRP) model showed improved performance compared with unimodal models based on radiology or pathology data alone, as well as the PD-L1 combined positive score biomarker. In addition, tumors from the model-predicted partial response group showed significant enrichment of immune-related biologic processes.
Study and Model Details
This multicenter, retrospective diagnostic study included patients with pathologically confirmed gastric cancer treated with first-line anti–PD-(L)1 therapy. The primary cohort was drawn from Fudan University Zhongshan Hospital and included 181 patients with pretreatment contrast-enhanced CT imaging and 163 with hematoxylin and eosin–stained tissue data. Of this population, 84 patients were assigned to training (n = 67) and internal validation (n = 17) sets. External validation cohorts were obtained from the First Affiliated Hospital of Nanchang University (n = 52) and the Xiamen branch of Zhongshan Hospital (n = 17). A subset of 23 patients from Fudan University Zhongshan Hospital with available RNA sequencing data was used for biologic analyses.
The investigators derived pathologic features from whole-slide images using clustering-constrained attention multiple instance learning (CLAM) and a Graph Neural Network–Transformer framework; subsequent features were further extracted using an open-source radiomics software package. Radiomic features were derived from CT images following manual tumor delineation and extraction from the finalized regions of interest using the same radiomics software.
The CRP self-attention–based fusion model integrated pathomics, radiomics, and clinical features to predict treatment response. Interpretability analyses included cell-type quantification and transcriptional profiling.
Multimodal Model Predicts Treatment Response
The unimodal radiomics model predicted tumor response with an area under the curve (AUC) of 0.96 in the training cohort and 0.78 in the validation cohort; its association with progression-free and overall survival based on predicted partial vs non–partial responder status was found to be significant in the training cohort (P = .007 and P = .019, respectively) but not in the validation cohort (P = .963 and P = .254). Tumor response prediction using the pathomics-only model resulted in an AUC of 0.95 in the training cohort and 0.81 in the validation cohort. In the training cohort, progression-free survival seemed to be significantly associated (P = .020), whereas overall survival showed a nonsignificant association (P = .077); neither progression-free (P = .527) nor overall (P = .190) survival appeared significant in the validation cohort.
The CRP model achieved an AUC of 0.97 for tumor response prediction in the training cohort. Patients classified as predicted partial responders demonstrated significantly longer progression-free (P = .016) and overall (P = .032) survival compared with non–partial responders. These findings were further supported in the validation cohort (AUC = 0.86; progression-free survival: P = .019; overall survival: P = .043).
In multivariable analyses adjusting for age, sex, and PD-L1 expression, according to Dr. Wang, the CRP model remained significantly associated with both progression-free and overall survival in the training and validation cohorts. Predictive performance was also found to be maintained in the two external cohorts from the First Affiliated Hospital of Nanchang University (AUC = 0.79; progression-free survival: P = .002; overall survival: P = .005) and the Xiamen branch of Zhongshan Hospital (AUC = 0.92; progression-free survival: P = .021; overall survival: P = .031).
Across the training, validation, and external test cohorts, the CRP model seemed to outperform unimodal radiomics and pathomics models, including in precision-recall analyses addressing class imbalance. Confusion matrices showed consistent discrimination between predicted partial responders and non–partial responders across all cohorts, according to Dr. Wang, which indicates “robust predictive performance and good generalizability.”
CRP Model Interpretability
The predicted partial response group showed higher whole-slide image–derived cell-type fractions compared with the predicted non–partial response group.
Gene expression profiling demonstrated significant differences between predicted partial responders and non–partial responders. Based on Reactome analysis, genes enriched in the predicted non–partial responders were significantly associated with immune inhibitory pathways, including CD22-mediated B-cell receptor regulation, interleukin-10 synthesis, and inflammatory signaling pathways such as NF-κB.
Gene set enrichment analysis further revealed enrichment of interferon-γ/α response pathways in predicted partial responders, whereas lipid-related pathways, which, as noted by Dr. Wang, have been previously reported to be associated with immune suppression, were enriched in non–partial responders.
DISCLOSURES: Dr. Wang reported no conflicts of interest. For full disclosures of the other study authors, visit aacr.org.
ASCO AI in Oncology is published by Conexiant under a license arrangement with the American Society of Clinical Oncology, Inc. (ASCO®). The ideas and opinions expressed in ASCO AI in Oncology do not necessarily reflect those of Conexiant or ASCO. For more information, see Policies.