LLMs Predict Progression to Cancer in Colitis-Associated Low-Grade Dysplasia
Low-grade dysplasia can be an early warning sign of colorectal cancer, but only a fraction of cases progress enough to become cancerous. Patients with ulcerative colitis who develop low-grade dysplasia face an increased risk of progression to advanced neoplasia, yet determining which individuals will develop high-grade dysplasia or colorectal cancer remains a major clinical challenge.
In a new study published in Clinical Gastroenterology and Hepatology, Johnson et al reported that a fully automated AI pipeline using large language models (LLMs) can accurately stratify future risk of advanced neoplasia in patients with colitis-associated low-grade dysplasia.
Model Methods
To construct the study cohort, the researchers applied open-weight LLMs to more than 5 million free-text clinical notes and pathology reports in the U.S. Veterans Affairs (VA) national database, integrating these data with structured sources such as ICD and CPT codes and the VA cancer registry. From this automated pipeline, 55,450 longitudinal histories of patients with ulcerative colitis were reconstructed, and 2,939 patients with an index low-grade dysplasia diagnosis between 1999 and 2024 met eligibility criteria. Patients were followed from the date of index low-grade dysplasia until development of advanced neoplasia (defined as high-grade dysplasia and/or colorectal cancer) or censoring due to colectomy, death, or last follow-up.
The AI system extracted four established clinicopathologic risk factors from colonoscopy and pathology reports—low-grade dysplasia size ≥ 1 cm, incomplete resection or invisible low-grade dysplasia, multifocal low-grade dysplasia, and moderate or severe endoscopic inflammation—for establishing phenotypes, and applied the previously validated UC-CaRE risk model without refitting.
Manual chart review confirmed 88% to 93% accuracy of LLM-derived predictions in a validation cohort.
Key Takeaways
Over the course of 20,279 patient-years of follow-up, 209 patients (7.1%) progressed to advanced neoplasia. Kaplan-Meier analysis demonstrated highly significant separation among five risk groups based on four established factors—dysplasia size, lesion resection completeness and visibility, number of dysplastic sites, and severity of inflammation—defined by the number of risk factors present at index low-grade dysplasia (P < .0001). At 5 years, the risk of advanced neoplasia ranged from 2.5% (95% confidence interval [CI] = 1.6%–3.4%) in the lowest-risk group to 27.8% (95% CI = 0.4%–47.6%) in the highest-risk group. Nearly half of all patients fell into the lowest-risk category, and the AI pipeline accurately predicted that 98.9% of those individuals would remain free of advanced neoplasia within 24 months. In contrast, patients with unresectable visible low-grade dysplasia had a 5-year cumulative advanced neoplasia risk of 37.4% (95% CI = 21.7%–50%), nearly double the risk often perceived by clinicians. Model calibration remained good through 10 years of follow-up.
The authors concluded, “our automated AI pipeline … provides accurate cancer risk predictions in [patients with] [ulcerative colitis and low-grade dysplasia] to aid decision-making. Advances in LLMs and prompt design will enable further improvements in data-driven clinical decisions based on quantitative forecasts of future risk.”
Brian Johnson, MD, of the Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, California, is the corresponding author for the Clinical Gastroenterology and Hepatology article.
DISCLOSURES: The study was funded by the U.S. Veterans Affairs Biomedical Laboratory Research and Development Service. Dr. Shah is a paid ad hoc consultant for RedHill Biopharma and Phathom Pharmaceuticals, and an unpaid scientific advisory board member for Ilico Genetics, Inc. The other study authors reported no competing interests.
ASCO AI in Oncology is published by Conexiant under a license arrangement with the American Society of Clinical Oncology, Inc. (ASCO®). The ideas and opinions expressed in ASCO AI in Oncology do not necessarily reflect those of Conexiant or ASCO. For more information, see Policies.
Performance of a convolutional neural network in determining differentiation levels of cutaneous squamous cell carcinomas was on par with that of experienced dermatologists, according to the results of a recent study published in JAAD International.
“This type of cancer, which is a result of mutations of the most common cell type in the top layer of the skin, is strongly linked to accumulated [ultraviolet] radiation over time. It develops in sun-exposed areas, often on skin already showing signs of sun damage, with rough scaly patches, uneven pigmentation, and decreased elasticity,” stated lead researcher Sam Polesie, MD, PhD, Associate Professor of Dermatology and Venereology at the University of Gothenburg and Practicing Dermatologist at Sahlgrenska University Hospital, both in Gothenburg, Sweden.