相关文章- 学术资源搜索

Automatic multilingual stopwords identification from very small corpora

S Ferilli - Electronics, 2021 - mdpi.com

Tools for Natural Language Processing work using linguistic resources, that are language-
specific. The complexity of building such resources causes many languages to lack them …

被引用次数：11 相关文章所有 5 个版本

[PDF] eventhelpr.com

Automatic stopwords identification from very small corpora

S Ferilli, GL Izzi, T Franza - Intelligent Systems in Industrial Applications, 2021 - Springer

Abstract Natural Language Processing tools use language-specific linguistic resources, that
might be unavailable for many languages. Since manually building them is complex, it …

被引用次数：5 相关文章所有 3 个版本

[PDF] scitepress.org

Stopwords identification by means of characteristic and discriminant analysis

G Armano, F Fanni, A Giuliani - International Conference on Agents …, 2015 - scitepress.org

Stopwords are meaningless, non-significant terms that frequently occur in a document. They
should be removed, like a noise. Traditionally, two different approaches of building a stoplist …

被引用次数：11 相关文章所有 5 个版本

[PDF] sagepub.com

Extracting domain-specific stopwords for text classifiers

M Makrehchi, MS Kamel - Intelligent Data Analysis, 2017 - content.iospress.com

In this paper, an automatic generation of domain-specific stopwords from a large labeled
corpus is proposed. In the majority of text mining tasks, stopwords are removed according to …

被引用次数：29 相关文章所有 2 个版本

Automatic extraction of domain-specific stopwords from labeled documents

M Makrehchi, MS Kamel - European Conference on Information Retrieval, 2008 - Springer

Automatic extraction of domain-specific stopword list from a large labeled corpus is
discussed. Most researches remove the stopwords using a standard stopword list, and high …

被引用次数：93 相关文章所有 9 个版本

[PDF] sciencedirect.com

Automatic learning of linguistic resources for stopword removal and stemming from text

S Ferilli, F Esposito, D Grieco - Procedia Computer Science, 2014 - Elsevier

While multimedia digital documents are progressively spreading, most of the content of
Digital Libraries is still in the form of text, and this predominance will probably never be …

被引用次数：52 相关文章所有 8 个版本

[PDF] academia.edu

[PDF][PDF] Automatic Extraction of Indonesian Stopwords

HTY Achsan, H Suhartanto, WC Wibowo… - International Journal of …, 2023 - academia.edu

The rapid growth of the Indonesian language content on the Internet has drawn researchers'
attention. By using natural language processing, they can extract high-value information …

被引用次数：7 相关文章所有 6 个版本

Loanword identification based on web resources: A case study on wikipedia

C Mi - Computer Speech & Language, 2023 - Elsevier

To alleviate the resource scarcity and improve the robustness in loanword identification, the
current study proposes a novel loanword identification method based on Wikipedia. In this …

被引用次数：2 相关文章所有 2 个版本

[PDF] northwestern.edu

A universal information theoretic approach to the identification of stopwords

M Gerlach, H Shi, LAN Amaral - Nature Machine Intelligence, 2019 - nature.com

One of the most widely used approaches in natural language processing and information
retrieval is the so-called bag-of-words model. A common component of such methods is the …

被引用次数：65 相关文章所有 3 个版本

Context aware stopwords for Sinhala Text classification

SVS Gunasekara, PS Haddela - 2018 National Information …, 2018 - ieeexplore.ieee.org

When working with Text Classification (TC), often the term" stopword" can be heard. Words
in a document that are frequently occurring, but meaningless in terms of Information …

被引用次数：16 相关文章所有 3 个版本

高级搜索

QQ 群