Data-driven materials research enabled by natural language processing and information extraction

EA Olivetti, JM Cole, E Kim, O Kononova… - Applied Physics …, 2020 - pubs.aip.org
Given the emergence of data science and machine learning throughout all aspects of
society, but particularly in the scientific domain, there is increased importance placed on …

Opportunities and challenges for machine learning in materials science

D Morgan, R Jacobs - Annual Review of Materials Research, 2020 - annualreviews.org
Advances in machine learning have impacted myriad areas of materials science, such as
the discovery of novel materials and the improvement of molecular simulations, with likely …

Quantifying the advantage of domain-specific pre-training on named entity recognition tasks in materials science

A Trewartha, N Walker, H Huo, S Lee, K Cruse… - Patterns, 2022 - cell.com
A bottleneck in efficiently connecting new materials discoveries to established literature has
arisen due to an increase in publications. This problem may be addressed by using named …

MatSciBERT: A materials domain language model for text mining and information extraction

T Gupta, M Zaki, NMA Krishnan, Mausam - npj Computational Materials, 2022 - nature.com
A large amount of materials science knowledge is generated and stored as text published in
peer-reviewed scientific literature. While recent developments in natural language …

An analysis of simple data augmentation for named entity recognition

X Dai, H Adel - arXiv preprint arXiv:2010.11683, 2020 - arxiv.org
Simple yet effective data augmentation techniques have been proposed for sentence-level
and sentence-pair natural language processing tasks. Inspired by these efforts, we design …

Lifelong pretraining: Continually adapting language models to emerging corpora

X Jin, D Zhang, H Zhu, W Xiao, SW Li, X Wei… - arXiv preprint arXiv …, 2021 - arxiv.org
Pretrained language models (PTLMs) are typically learned over a large, static corpus and
further fine-tuned for various downstream tasks. However, when deployed in the real world …

[HTML][HTML] Opportunities and challenges of text mining in materials research

O Kononova, T He, H Huo, A Trewartha, EA Olivetti… - Iscience, 2021 - cell.com
Research publications are the major repository of scientific knowledge. However, their
unstructured and highly heterogenous format creates a significant obstacle to large-scale …

Automated extraction of chemical synthesis actions from experimental procedures

AC Vaucher, F Zipoli, J Geluykens, VH Nair… - Nature …, 2020 - nature.com
Experimental procedures for chemical synthesis are commonly reported in prose in patents
or in the scientific literature. The extraction of the details necessary to reproduce and …

A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing

P Shetty, AC Rajan, C Kuenneth, S Gupta… - npj Computational …, 2023 - nature.com
The ever-increasing number of materials science articles makes it hard to infer chemistry-
structure-property relations from literature. We used natural language processing methods to …

Language models and protocol standardization guidelines for accelerating synthesis planning in heterogeneous catalysis

M Suvarna, AC Vaucher, S Mitchell, T Laino… - Nature …, 2023 - nature.com
Synthesis protocol exploration is paramount in catalyst discovery, yet keeping pace with
rapid literature advances is increasingly time intensive. Automated synthesis protocol …