From text to insight: large language models for materials science data extraction

M Schilling-Wilhelmi, M Ríos-García, S Shabih… - arXiv preprint arXiv …, 2024 - arxiv.org
The vast majority of materials science knowledge exists in unstructured natural language,
yet structured data is crucial for innovative and systematic materials design. Traditionally, the …

Integrating machine learning and large language models to advance exploration of electrochemical reactions

Z Zheng, F Florit, B Jin, H Wu, SC Li… - Angewandte …, 2024 - Wiley Online Library
Electrochemical C‐H oxidation reactions offer a sustainable route to functionalize
hydrocarbons, yet identifying suitable substrates and optimizing synthesis remain …

Large property models: a new generative machine-learning formulation for molecules

T Jin, V Singla, HH Hsu, BM Savoie - Faraday Discussions, 2025 - pubs.rsc.org
Generative models for the inverse design of molecules with particular properties have been
heavily hyped, but have yet to demonstrate significant gains over machine-learning …

Suitability of large language models for extraction of high-quality chemical reaction dataset from patent literature

SR Vangala, SR Krishnan, N Bung… - Journal of …, 2024 - Springer
With the advent of artificial intelligence (AI), it is now possible to design diverse and novel
molecules from previously unexplored chemical space. However, a challenge for chemists is …

What I Learned from Analyzing Accurate Mass Data of 3000 Supporting Information Files

M Christmann - Organic Letters, 2024 - ACS Publications
A Python script for the systematic, high-throughput analysis of accurate mass data was
developed and tested on more than 3000 Supporting Information (SI) PDFs from Organic …

LLMatDesign: Autonomous Materials Discovery with Large Language Models

S Jia, C Zhang, V Fung - arXiv preprint arXiv:2406.13163, 2024 - arxiv.org
Discovering new materials can have significant scientific and technological implications but
remains a challenging problem today due to the enormity of the chemical space. Recent …

Augmented and Programmatically Optimized LLM Prompts Reduce Chemical Hallucinations

S Reed - 2025 - chemrxiv.org
Utilizing Large Language Models (LLMs) for handling scientific information comes with risk
of the outputs not matching expectations, commonly called hallucinations. To fully utilize …

What I Learned from Analyzing Accurate Mass Data of 3000 SI Files

M Christmann - 2024 - chemrxiv.org
A Python script for the systematic, high-throughput analysis of accurate mass data was
developed and tested on over 3,000 Supporting Information (SI) PDFs from Organic Letters …