0 Datasets
0 Files
Get instant academic access to this publication’s datasets.
Join our academic network to download verified datasets and collaborate with researchers worldwide.
Get Free AccessAbstract Importance Large multiple sclerosis (MS) registries provide crucial real-world evidence but often suffer from missing data, inconsistencies, and privacy limitations that restrict data sharing. The use of generative AI to create synthetic data (SD) is an emerging strategy to enhance real-world evidence research potentially overcoming these challenges. Objective To evaluate the validity of AI-generated synthetic data (SD) in replicating real data collected in the Italian MS and Related Disorders Register (RISM), and to compare the risk of progression independent of relapse activity (PIRA) between early intensive treatment (EIT) versus escalation treatment strategy (ESC) in both real and synthetic MS cohorts. Design, Setting, and Participants This validation study analyzed data from RISM. AI-based generative models were trained on a sub-cohort of 1,666 patients with tabularized MRI data to generate a synthetic dataset of 4,878 patients. SD was evaluated using the Synthetic vAlidation FramEwork powered by Train (SAFE), assessing fidelity, utility, and privacy. Clinical Synthetic Fidelity (CSF) and Nearest Neighbor Distance Ratio (NNDR) were used for statistical and privacy validation. Treatment outcome comparisons between EIT and ESC strategies were conducted for clinical validation using both real and synthetic datasets, focusing on the risk of PIRA. Exposures Initial disease-modifying therapy strategy, categorized as EIT versus ESC. Main Outcomes and Measures Primary outcome was the occurrence of PIRA, defined as confirmed disability accrual independent of relapses. Validation metrics included Clinical Synthetic Fidelity (CSF ≥90 optimal) and Nearest Neighbor Distance Ratio (NNDR, range 0.60–0.85 for privacy). Results The synthetic dataset demonstrated high fidelity (CSF=97%) and privacy preservation (NNDR=0.61). Treatment effect estimates for ESCs vs EIT were consistent across real and synthetic datasets, with largely comparable trends, with increased statistical significance in SD. Cox proportional hazards models confirmed the robustness of synthetic data in estimating the risk of the first PIRA event. Conclusions and Relevance AI-generated synthetic data reliably replicated treatment effect outcomes from real-world RISM data, overcoming missing data and providing a privacy-preserving alternative for data sharing and clinical research. Key points Question Can Artificial Intelligence (AI)-generated synthetic data (SD) reliably replicate multiple sclerosis (MS) registry data and provide robust insights into progression independent of relapse activity (PIRA) phenomena under different treatment strategies? Findings In a cohort of 4,878 relapsing-onset MS patients from the Italian MS Register, AI-generated SD achieved high fidelity (CSF = 97%), and reproduced treatment effect outcomes. Both real and synthetic cohorts consistently showed that early intensive therapy reduced the risk of PIRA compared with an escalation strategy. Meaning SD can complement and enhance registry-based research by addressing missing data and supporting reproducible analyses in MS.
Pietro Iaffaldano, Saverio D’Amico, Giuseppe Lucisano, Massimiliano Copetti, Tommaso Guerra, Maria A. Rocca, Francesco Patti, Giovanna De Luca, Diana Ferraro, Rocco Totaro, Vincenzo Brescia Morra, Giuseppe Salemi, Emilio Portaccio, Matteo Foschi, Matilde Inglese, Maria Gabriella Coniglio, Clara Grazia Chisari, Francesca Caputo, Damiano Paolicelli, Mario Alberto Battaglia, Matteo Della Porta, Victor Savevski, Mattia Delleani, Filomena Colella, Elisabetta Sauta, Maria Pia Amato, Massimo Filippi, María Trojano (2025). Validation of Generative AI Techniques for Synthetic Data Generation in Multiple Sclerosis Research: A Comparison with Real-World Evidence from the Italian MS Registry. , DOI: https://doi.org/10.1101/2025.11.13.25340076.
Datasets shared by verified academics with rich metadata and previews.
Authors choose access levels; downloads are logged for transparency.
Students and faculty get instant access after verification.
Type
Preprint
Year
2025
Authors
28
Datasets
0
Total Files
0
DOI
https://doi.org/10.1101/2025.11.13.25340076
Access datasets from 50,000+ researchers worldwide with institutional verification.
Get Free AccessYes. After verification, you can browse and download datasets at no cost. Some premium assets may require author approval.
Files are stored on encrypted storage. Access is restricted to verified users and all downloads are logged.
Yes, message the author after sign-up to request supplementary files or replication code.
Join 50,000+ researchers worldwide. Get instant access to peer-reviewed datasets, advanced analytics, and global collaboration tools.
✓ Immediate verification • ✓ Free institutional access • ✓ Global collaboration