News Research Genitourinary Cancers Prognostic & Predictive Models

Federated Learning Enables Robust Prognostic Modeling in Anal Cancer Across International Real-World Cohorts

Stable results across multiple centers highlight the potential of federated learning to generate generalizable prognostic models for rare cancers.

April 02, 2026 By Meg Barbor 6 min read

In an era of increasingly granular precision oncology, even historically “common” cancers are fragmenting into biologically distinct and effectively rare subgroups. This shift has exposed a fundamental limitation of traditional research paradigms: the inability to assemble sufficiently large, high-quality data sets to support reliable prognostic modeling across heterogeneous patient populations.

A new international study from the atomCAT (Anal Cancer Treatment Outcome Modelling with Computer-Aided Theragnostics) consortium demonstrates how federated learning may offer a scalable solution to this problem, enabling collaborative model development across institutions without requiring patient-level data sharing. Findings from the study exploring prediction models for outcomes of patients with anal cancer who received chemoradiotherapy based on federated learning were published in Nature Communications.

A Distributed Approach to a Data-Limited Problem

The investigators developed and validated prognostic models for overall survival, locoregional control, and freedom from distant metastases using data from 1,428 patients treated across 14 centers in Europe and Australia, with external validation in an additional 277 patients from two independent institutions. All patients had anal cancer and had undergone modern chemotherapy and/or radiotherapy concurrently.

Rather than pooling data centrally—a process often hindered by regulatory, privacy, and governance barriers—the study used a federated architecture. In this framework, patient data remained securely within each institution. Models were trained iteratively through the exchange of aggregated parameters (eg, coefficients, gradients) from each participating center, these were aggregated at one central server to produce a global model.

In practical terms, each site “trains” the model locally on its own data, sends back summary updates, and receives an improved global model in return. This cycle repeats until convergence, allowing the system to learn from geographically dispersed data sets without exposing individual patient records.

Additional Methodology

The models used eight factors that were expected to impact patient outcomes. The included parameters were age, biological sex, T stage, nodal involvement, primary gross tumor volume, prescribed primary tumor dose, and histology.

A set of secondary models were developed to explore other factors and different definitions of the same factors, such as tumor staging, performance status, and more. Feature selection or model reduction was not implemented in the federated learning architecture.

Performance of the model was tested internally with a leave-one-center-out approach to determine the level of overfitting.

Performance and Validation Across Centers

The federated models demonstrated consistent discrimination and calibration for each of three studied endpoints. In the primary cohort, Harrell’s concordance indices were 0.68 for overall survival, 0.71 for locoregional control, and 0.69 for freedom from distant metastases. External validation further reinforced generalizability, with c-indices improving to 0.72 for overall survival, 0.75 for locoregional control, and 0.79 for freedom from distant metastases.

Notably, when comparable models were trained using data from a single center (n = 210), performance dropped substantially in external data sets (overall survival: 0.60; locoregional control: 0.70; freedom from distant metastases: 0.62), underscoring the value of distributed, multi-institutional learning.

Real-World Outcomes and Risk Stratification

At 3 years, observed outcomes across centers were 83% for overall survival, 83% for locoregional control, and 87% for freedom from distant metastases. The models enabled clinically meaningful risk stratification, with patients classified as low risk experiencing markedly better outcomes than those at high risk, including overall survival of 90% vs 73%, locoregional control of 91% vs 76%, and freedom from distant metastases of 94% vs 83%.

These separations suggest potential utility for treatment stratification and clinical decision support.

Key Prognostic Factors

Across models, several variables consistently emerged as prognostic factors:

Lower T stage and absence of nodal involvement were associated with improved survival and disease control
Smaller gross tumor volume was a strong predictor across all endpoints
Female sex was associated with improved overall survival and locoregional control
Younger age predicted better overall survival
Doublet chemotherapy (mitomycin- or cisplatin-based) significantly improved overall survival compared with no chemotherapy

Nodal involvement was associated with a 45% higher risk of death and approximately double the risk of distant metastases, while larger tumor volume was associated with more than a two-fold increase in risk across endpoints, including overall survival and locoregional control.

Interestingly, radiotherapy dose was not independently prognostic for any endpoint in these models, likely reflecting confounding by indication and inter-institutional variability in dose reporting.

Why Federated Learning Matters in Oncology

The technical innovation of this study lies less in the statistical model itself and more in how the model is trained across institutions. Federated learning effectively sidesteps one of the most persistent barriers in oncology research: data fragmentation. Rare cancers like anal carcinoma (which comprises about 0.3% of all cancers) are treated in small numbers at individual centers, making traditional large-scale analyses difficult.

By enabling analysis across distributed data sets, federated learning allows investigators to leverage real-world data at scale, preserve patient privacy while complying with data protection regulations, and capture institutional variability reflective of real-world practice.

From a methodological standpoint, this approach produces results comparable to those from traditional centralized analyses, indicating that improved collaboration and scale do not come at the expense of statistical rigor.

Clinical and Research Implications

Although model discrimination was moderate, performance is comparable to other real-world prognostic models in oncology and reflects the inherent heterogeneity of rare cancer populations.

More importantly, the study provides a framework for future research. Federated approaches could support risk-adapted trial design, inclusion of underrepresented populations (eg, older patients, who comprised about 28% of this cohort), and integration of multimodal data, including imaging and biomarkers.

The authors, including corresponding author Stelios Theophanous, PhD, of Leeds Institute of Medical Research at St. James’s, University of Leeds, United Kingdom, emphasized that real-world data and federated modeling should complement—not replace—prospective clinical trials. However, these approaches may be particularly valuable as decision support tools in hypothesis generation and identification of clinically relevant risk groups.

As oncology continues to move toward molecular sub-stratification, the distinction between “common” and “rare” cancers will continue to blur. In that context, federated learning offers a pragmatic path forward: one that aligns with both the scale and the privacy constraints of modern cancer research.

The atomCAT study demonstrates that meaningful, generalizable prognostic models can be built without centralizing data, potentially reshaping how international collaborations are conducted in oncology.

DISCLOSURES: Dr. Wee receives consultancy fees when providing continuing professional development courses for radiotherapy physicists via Elekta AB (Stockholm, Sweden). Dr. Dekker is a founder and employee of Medical Data Works B.V, which provides commercial support for Vantage6 -based federated learning infrastructures. The remaining authors declared no competing interests. For access to the code developed for this study, visit nature.com.

ASCO AI in Oncology is published by Conexiant under a license arrangement with the American Society of Clinical Oncology, Inc. (ASCO^®). The ideas and opinions expressed in ASCO AI in Oncology do not necessarily reflect those of Conexiant or ASCO. For more information, see Policies.

Performance of a convolutional neural network in determining differentiation levels of cutaneous squamous cell carcinomas was on par with that of experienced dermatologists, according to the results of a recent study published in JAAD International.

“This type of cancer, which is a result of mutations of the most common cell type in the top layer of the skin, is strongly linked to accumulated [ultraviolet] radiation over time. It develops in sun-exposed areas, often on skin already showing signs of sun damage, with rough scaly patches, uneven pigmentation, and decreased elasticity,” stated lead researcher Sam Polesie, MD, PhD, Associate Professor of Dermatology and Venereology at the University of Gothenburg and Practicing Dermatologist at Sahlgrenska University Hospital, both in Gothenburg, Sweden.

KOL Commentary

Watch