Capitalization and punctuation restoration: a survey

V Păiş, D Tufiş - Artificial Intelligence Review, 2022 - Springer
Ensuring proper punctuation and letter casing is a key pre-processing step towards applying
complex natural language processing algorithms. This is especially significant for textual …

Robust named entity recognition with truecasing pretraining

S Mayhew, G Nitish, D Roth - Proceedings of the AAAI Conference on …, 2020 - ojs.aaai.org
Although modern named entity recognition (NER) systems show impressive performance on
standard datasets, they perform poorly when presented with noisy data. In particular …

Fullstop: Punctuation and segmentation prediction for dutch with transformers

V Vandeghinste, O Guhr - Language Resources and Evaluation, 2024 - Springer
When applying automated speech recognition (ASR) for Belgian Dutch, the output consists
of an unsegmented stream of words, without any punctuation. A next step is to perform …

Towards automatic generation of shareable synthetic clinical notes using neural language models

O Melamud, C Shivade - arXiv preprint arXiv:1905.07002, 2019 - arxiv.org
Large-scale clinical data is invaluable to driving many computational scientific advances
today. However, understandable concerns regarding patient privacy hinder the open …

[PDF][PDF] Normalization of historical texts with neural network models

M Bollmann - 2018 - hss-opus.ub.ruhr-uni-bochum.de
With the increasing availability of digitized resources of historical documents, interest in
effective natural language processing (NLP) for these documents is on the rise. However …

Cocomix: Utilizing comments to improve non-visual webtoon accessibility

M Huh, YJ Lee, D Choi, H Kim, U Oh… - Proceedings of the 2022 …, 2022 - dl.acm.org
Webtoon is a type of digital comics read online where readers can leave comments to share
their thoughts on the story. While it has experienced a surge in popularity internationally …

Capitalization normalization for language modeling with an accurate and efficient hierarchical RNN model

H Zhang, YC Cheng, S Kumar… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Capitalization normalization (truecasing) is the task of restoring the correct case (uppercase
or lowercase) of noisy text. We propose a fast, accurate and compact two-level hierarchical …

Capitalization Feature and Learning Rate for Improving NER Based on RNN BiLSTM-CRF

E Noersasongko - 2022 IEEE International Conference on …, 2022 - ieeexplore.ieee.org
Entity extraction in the natural language processing research field is still a widely
researched topic. It can be a data source for the next NLP stage, such as text summarization …

From dataset recycling to multi-property extraction and beyond

T Dwojak, M Pietruszka, Ł Borchmann… - arXiv preprint arXiv …, 2020 - arxiv.org
This paper investigates various Transformer architectures on the WikiReading Information
Extraction and Machine Reading Comprehension dataset. The proposed dual-source model …

NER and POS when nothing is capitalized

S Mayhew, T Tsygankova, D Roth - arXiv preprint arXiv:1903.11222, 2019 - arxiv.org
For those languages which use it, capitalization is an important signal for the fundamental
NLP tasks of Named Entity Recognition (NER) and Part of Speech (POS) tagging. In fact, it is …