Machine Learning in Multi-Omics Data to Assess Longitudinal Predictors of Glycaemic Health

Laurie Prélot; Harmen H. M. Draisma; Mila Desi Anasanti; Zhanna Balkhiyarova; Matthias Wielscher; Loïc Yengo; Beverley Balkau; Ronan Roussel; Sylvain Sebért; Mika Ala‐Korpela; Philippe Froguel; Paul M Ridker; Marika Kaakinen; Inga Prokopenko

doi:10.1101/358390

Verified authors • Institutional access • DOI aware

50,000+ researchers120,000+ datasets90% satisfaction

Preprint

2018

Machine Learning in Multi-Omics Data to Assess Longitudinal Predictors of Glycaemic Health

0 Datasets

0 Files

2018

DOI: 10.1101/358390

Get instant academic access to this publication’s datasets.

Create free account How it works

Frequently asked questions

Is access really free for academics and students?

Yes. After verification, you can browse and download datasets at no cost. Some premium assets may require author approval.

How is my data protected?

Files are stored on encrypted storage. Access is restricted to verified users and all downloads are logged.

Can I request additional materials?

Yes, message the author after sign-up to request supplementary files or replication code.

Advance your research today

Join 50,000+ researchers worldwide. Get instant access to peer-reviewed datasets, advanced analytics, and global collaboration tools.

Get free academic access Learn more

✓ Immediate verification • ✓ Free institutional access • ✓ Global collaboration

Abstract Type 2 diabetes (T2D) is a global health burden that will benefit from personalised risk prediction and targeted prevention programmes. Omics data have enabled more detailed risk prediction; however, most studies have focussed on directly on the ability of DNA variants predicting T2D onset with less attention given to epigenetic regulation and glycaemic trait variability. By applying machine learning to the longitudinal Northern Finland Birth Cohort 1966 (NFBC 1966) at 31 (T1) and 46 (T2) years old, we predicted fasting glucose (FG) and insulin (FI), glycated haemoglobin (HbA1c) and 2-hour glucose and insulin from oral glucose tolerance test (2hGlu, 2hIns) at T2 in 513 individuals from 1,001 variables at T1 and T2, including anthropometric, metabolic, metabolomic and epigenetic variables. We further tested whether the information obtained by the machine learning models in NFBC could be used to predict glycaemic traits in the independent French study with 48 matching predictors (DESIR, N=769, age range 30-65 years at recruitment, interval between data collections: 9 years). In this study, FG and FI were best predicted, with average R 2 values of 0.38 and 0.53. Sex, branched-chain and aromatic amino acids, HDL-cholesterol, glycerol, ketone bodies, blood pressure at T2 and measurements of adiposity at T1, as well as multiple methylation marks at both time points were amongst the top predictors. In the validation analysis, we reached R 2 values of 0.41/0.55 for FG/FI when trained and tested in NFBC1966 and 0.17/0.30 when trained in NFBC1966 and tested in DESIR. We identified clinically relevant sets of predictors from a large multi-omics dataset and highlighted the potential of methylation markers and longitudinal changes in prediction.

Machine Learning in Multi-Omics Data to Assess Longitudinal Predictors of Glycaemic Health

Frequently asked questions

Is access really free for academics and students?

How is my data protected?

Can I request additional materials?

Advance your research today

Machine Learning in Multi-Omics Data to Assess Longitudinal Predictors of Glycaemic Health

Frequently asked questions

Is access really free for academics and students?

How is my data protected?

Can I request additional materials?

Advance your research today

Access Research Data

This PDF is not available in different languages.

Paul M Ridker

Abstract

How to cite this publication

Related publications

Why join Raw Data Library?

Quality

Control

Free for Academia

Publication Details

Join Research Community