Are large language models superhuman chemists?

Adrian Mirza; Nawaf Alampara; Sreekanth Kunchapu; Benedict Emoekabu; Aswanth Krishnan; Tanya Gupta; Macjonathan Okereke; Amir Mohammad Elahi; Mehrdad Asgari; J. Eberhardt; Maximilian Greiner; Caroline T. Holick; Christina Glaubitz; Tim Hoffmann; Lea C. Klepsch; Yannik Köster; Fabian Alexander Kreth; Jakob Meyer; Santiago Miret; Michael Ringleb; Nicole C. Roesner; Ulrich Sigmar Schubert; Leanne M. Stafast; Dinga Wonanke; Michael Pieler; Philippe Schwaller; Kevin Maik Jablonka

doi:10.48550/arxiv.2404.01475

Verified authors • Institutional access • DOI aware

50,000+ researchers120,000+ datasets90% satisfaction

Preprint

2024

Are large language models superhuman chemists?

0 Datasets

0 Files

2024

DOI: 10.48550/arxiv.2404.01475 arxiv.org/abs/2404.01475

Get instant academic access to this publication’s datasets.

Create free account How it works

Frequently asked questions

Is access really free for academics and students?

Yes. After verification, you can browse and download datasets at no cost. Some premium assets may require author approval.

How is my data protected?

Files are stored on encrypted storage. Access is restricted to verified users and all downloads are logged.

Can I request additional materials?

Yes, message the author after sign-up to request supplementary files or replication code.

Advance your research today

Join 50,000+ researchers worldwide. Get instant access to peer-reviewed datasets, advanced analytics, and global collaboration tools.

Get free academic access Learn more

✓ Immediate verification • ✓ Free institutional access • ✓ Global collaboration

Large language models (LLMs) have gained widespread interest due to their ability to process human language and perform tasks on which they have not been explicitly trained. However, we possess only a limited systematic understanding of the chemical capabilities of LLMs, which would be required to improve models and mitigate potential harm. Here, we introduce "ChemBench," an automated framework for evaluating the chemical knowledge and reasoning abilities of state-of-the-art LLMs against the expertise of chemists. We curated more than 2,700 question-answer pairs, evaluated leading open- and closed-source LLMs, and found that the best models outperformed the best human chemists in our study on average. However, the models struggle with some basic tasks and provide overconfident predictions. These findings reveal LLMs' impressive chemical capabilities while emphasizing the need for further research to improve their safety and usefulness. They also suggest adapting chemistry education and show the value of benchmarking frameworks for evaluating LLMs in specific domains.

Are large language models superhuman chemists?

Frequently asked questions

Is access really free for academics and students?

How is my data protected?

Can I request additional materials?

Advance your research today

Are large language models superhuman chemists?

Frequently asked questions

Is access really free for academics and students?

How is my data protected?

Can I request additional materials?

Advance your research today

Access Research Data

This PDF is not available in different languages.

Ulrich Sigmar Schubert

Abstract

How to cite this publication

Related publications

Why join Raw Data Library?

Quality

Control

Free for Academia

Publication Details

Join Research Community