Machine Learning–Derived Plasma Protein Signature May Enable Lung Cancer Prediction Years Before Diagnosis
An international collaboration of more than 80 investigators across four continents has led to the identification of a panel of blood proteins that predicted future lung cancer risk more than 5 years before diagnosis and outperformed current risk assessment models. The investigators also reported that use of an existing anti-inflammatory drug was associated with a lower risk of lung cancer among individuals with elevated levels of these proteins.
“Collectively, our work highlights the power of integrating machine learning–driven analyses of population-scale proteomics with biologic insights from preclinical models and clinical trial data, providing a framework for therapeutic interception and heralding a future for precision cancer prevention,” they wrote in a report of their findings, published in Cell.
Developing the Protein Signature
The investigators developed a machine learning model to predict future lung cancer diagnoses using plasma proteomic data from more than 48,000 participants in the UK Biobank Pharma Proteomics Project, including 375 cases of lung cancer. The analysis included 2,923 proteins measured at baseline and linked these data with subsequent cancer registry outcomes, with a median of 5.6 years between blood collection and lung cancer diagnosis. Using a 75:25 train-test split, recursive feature elimination was applied to identify a parsimonious predictive model of incident lung cancer cases (ie, a variable-selection method was used to identify the smallest set of factors that most accurately predicted the development of new lung cancer cases).
The final model included 14 plasma proteins together with 4 clinical variables associated with future lung cancer incidence: age, smoking status, pack-years of smoking, and prior chronic obstructive pulmonary disease. The selected proteins represented several biologic pathways, including inflammatory signaling (CXCL17, CDCP1, GDF15, PIGR, TNFSF13B, and PLAUR), extracellular matrix remodeling (MMP12), epithelial secretion or shedding (CEACAM5, WFDC2, ALPP, and PRSS8), and pulmonary surfactant production (LAMP3, SFTPD, and SFTPA1).
Machine learning models with Extreme Gradient Boosting (XGBoost), for classification and ranking, that were based on the 14 proteins only were compared with clinical-only models and combined models to see which performed best. The combined model performed significantly better than each individual model while the individual models performed comparably to each other.
Linking the Signature to Lung Inflammation
To further assess the identified protein markers, the investigators evaluated their associations with lung cancer incidence in eight additional proteomic cohorts from multiple countries. Across a combined data set of 2,198 cancer cases and 53,641 controls, all 14 proteins showed positive associations with future lung cancer incidence in a random-effects meta-analysis. The investigators found that the signature was elevated in current smokers and individuals exposed to particulate matter (PM); the signature was also linked to lung myeloid and alveolar cell types.
In mouse and cell models, EGFR-driven lung adenocarcinoma showed convergence of diverse epithelial lineages on a keratin 8/claudin 4–positive alveolar transitional (KAC) state—an injury-associated adaptive phenotype that may promote tumor formation in the presence of oncogenic mutations—whose transcriptional programs correlated with emergence of the signature. Signature components were induced by particulate matter, oncogenic EGFR, or interleukin (IL)-1β, while inhibition of IL-1β reduced particulate matter–driven expansion of the KAC state and early tumor development.
“We’ve shown that the signature reflects an altered inflammatory lung environment before cancer takes hold,” explained lead author Tej Pandya, MD, a PhD candidate at University College London/The Francis Crick Institute, in a news statement. “It’s a proof of concept that, one day, we could use this signature to offer preventive treatment to people at risk of lung cancer.”
Linking Lung Inflammation to Cancer Prevention
The investigators then applied the signature to a cohort from the prior randomized Canakinumab Anti-inflammatory Thrombosis Outcome Study (CANTOS) clinical trial of the IL-1β inhibitor canakinumab for potential lung cancer prevention. The CANTOS trial did demonstrate reduced lung cancer incidence and mortality in the canakinumab arm, but the study authors noted that the high number needed to treat to prevent a case of lung cancer limited its use in unselected populations. Reanalysis of more than 4,600 participants showed that the reduction was concentrated among the roughly 2,300 participants with high baseline signature levels, with risk nearly halved in this group (2.1% [canakinumab] vs 3.9% [placebo]; odds ratio = 0.52; P = .013) and a corresponding reduction in the number needed to treat.
“Restricting treatment to this high-risk group meant you would only need to treat 55 people [from 1,516 in the low-signature group] to prevent one case of lung cancer, a level comparable to established cardiovascular prevention approaches such as statins,” stated senior author Charles Swanton, MBPhD, FRCP, FMedSci, FAACR, FRS, Clinical Director at The Francis Crick Institute.
“This study is particularly powerful because it links a measurable signal in the blood to the underlying inflammatory processes implicated in cancer development,” added Iain Foulkes, PhD, Executive Director of Research and Innovation at Cancer Research UK and CEO of Cancer Research Horizons, who was not involved in the research. “By combining large-scale population data, advanced computational approaches, and fundamental biologic insights, it opens the door to a new era of precision prevention, where preventive treatments can be targeted to those most likely to benefit.”
Validation of the signature in further studies is needed, as is the development of a clinically usable test. A clinical trial of canakinumab for lung cancer prevention would also be required. Although the signature combines lung-specific and pleiotropic inflammatory components, replacing the former with other organ-specific plasma signals could extend this precision prevention approach to other cancer types, the investigators wrote.
DISCLOSURES: For full disclosures of the study authors, funding information, and data and code availability, visit cell.com.
ASCO AI in Oncology is published by Conexiant under a license arrangement with the American Society of Clinical Oncology, Inc. (ASCO®). The ideas and opinions expressed in ASCO AI in Oncology do not necessarily reflect those of Conexiant or ASCO. For more information, see Policies.