Automatic multilingual stopwords identification from very small corpora

S Ferilli - Electronics, 2021 - mdpi.com
Tools for Natural Language Processing work using linguistic resources, that are language-
specific. The complexity of building such resources causes many languages to lack them …

Automatic stopwords identification from very small corpora

S Ferilli, GL Izzi, T Franza - Intelligent Systems in Industrial Applications, 2021 - Springer
Abstract Natural Language Processing tools use language-specific linguistic resources, that
might be unavailable for many languages. Since manually building them is complex, it …

Stopwords identification by means of characteristic and discriminant analysis

G Armano, F Fanni, A Giuliani - International Conference on Agents …, 2015 - scitepress.org
Stopwords are meaningless, non-significant terms that frequently occur in a document. They
should be removed, like a noise. Traditionally, two different approaches of building a stoplist …

Extracting domain-specific stopwords for text classifiers

M Makrehchi, MS Kamel - Intelligent Data Analysis, 2017 - content.iospress.com
In this paper, an automatic generation of domain-specific stopwords from a large labeled
corpus is proposed. In the majority of text mining tasks, stopwords are removed according to …

Automatic extraction of domain-specific stopwords from labeled documents

M Makrehchi, MS Kamel - European Conference on Information Retrieval, 2008 - Springer
Automatic extraction of domain-specific stopword list from a large labeled corpus is
discussed. Most researches remove the stopwords using a standard stopword list, and high …

Automatic learning of linguistic resources for stopword removal and stemming from text

S Ferilli, F Esposito, D Grieco - Procedia Computer Science, 2014 - Elsevier
While multimedia digital documents are progressively spreading, most of the content of
Digital Libraries is still in the form of text, and this predominance will probably never be …

[PDF][PDF] Automatic Extraction of Indonesian Stopwords

HTY Achsan, H Suhartanto, WC Wibowo… - International Journal of …, 2023 - academia.edu
The rapid growth of the Indonesian language content on the Internet has drawn researchers'
attention. By using natural language processing, they can extract high-value information …

Loanword identification based on web resources: A case study on wikipedia

C Mi - Computer Speech & Language, 2023 - Elsevier
To alleviate the resource scarcity and improve the robustness in loanword identification, the
current study proposes a novel loanword identification method based on Wikipedia. In this …

A universal information theoretic approach to the identification of stopwords

M Gerlach, H Shi, LAN Amaral - Nature Machine Intelligence, 2019 - nature.com
One of the most widely used approaches in natural language processing and information
retrieval is the so-called bag-of-words model. A common component of such methods is the …

Context aware stopwords for Sinhala Text classification

SVS Gunasekara, PS Haddela - 2018 National Information …, 2018 - ieeexplore.ieee.org
When working with Text Classification (TC), often the term" stopword" can be heard. Words
in a document that are frequently occurring, but meaningless in terms of Information …