Information retrieval and text mining technologies for chemistry

M Krallinger, O Rabal, A Lourenco, J Oyarzabal… - Chemical …, 2017 - ACS Publications
Efficient access to chemical information contained in scientific literature, patents, technical
reports, or the web is a pressing need shared by researchers and patent attorneys from …

[HTML][HTML] Opportunities and challenges of text mining in materials research

O Kononova, T He, H Huo, A Trewartha, EA Olivetti… - Iscience, 2021 - cell.com
Research publications are the major repository of scientific knowledge. However, their
unstructured and highly heterogenous format creates a significant obstacle to large-scale …

ChemDataExtractor: a toolkit for automated extraction of chemical information from the scientific literature

MC Swain, JM Cole - Journal of chemical information and …, 2016 - ACS Publications
The emergence of “big data” initiatives has led to the need for tools that can automatically
extract valuable chemical information from large volumes of unstructured data, such as the …

Assessing the impact of OCR quality on downstream NLP tasks

D Van Strien, K Beelen, MC Ardanuy, K Hosseini… - 2020 - repository.cam.ac.uk
A growing volume of heritage data is being digitized and made available as text via optical
character recognition (OCR). Scholars and libraries are increasingly using OCR-generated …

[PDF][PDF] How noisy social media text, how diffrnt social media sources?

T Baldwin, P Cook, M Lui, A MacKinlay… - Proceedings of the sixth …, 2013 - aclanthology.org
While various claims have been made about text in social media text being noisy, there has
never been a systematic study to investigate just how linguistically noisy or otherwise it is …

PySBD: Pragmatic sentence boundary disambiguation

N Sadvilkar, M Neumann - arXiv preprint arXiv:2010.09657, 2020 - arxiv.org
In this paper, we present a rule-based sentence boundary disambiguation Python package
that works out-of-the-box for 22 languages. We aim to provide a realistic segmenter which …

[PDF][PDF] Part-of-speech tagging for code-mixed english-hindi twitter and facebook chat messages

A Jamatia, B Gambäck, A Das - 2015 - ntnuopen.ntnu.no
The paper reports work on collecting and annotating code-mixed English-Hindi so-cial
media text (Twitter and Facebook messages), and experiments on automatic tagging of …

Machine learning to support social media empowered patients in cancer care and cancer treatment decisions

D De Silva, W Ranasinghe, T Bandaragoda, A Adikari… - PloS one, 2018 - journals.plos.org
Background A primary variant of social media, online support groups (OSG) extend beyond
the standard definition to incorporate a dimension of advice, support and guidance for …

A survey on syntactic processing techniques

X Zhang, R Mao, E Cambria - Artificial Intelligence Review, 2023 - Springer
Computational syntactic processing is a fundamental technique in natural language
processing. It normally serves as a pre-processing method to transform natural language …

Sentence boundary detection in legal text

G Sanchez - Proceedings of the natural legal language …, 2019 - aclanthology.org
In this paper, we examined several algorithms to detect sentence boundaries in legal text.
Legal text presents challenges for sentence tokenizers because of the variety of …