C Boulis, M Ostendorf - Proc. of the International Workshop in …, 2005 - researchgate.net
The most prevalent representation for text classification is the bag-of-words vector. A number of approaches have sought to replace or augment the bag-of-words representation with …
In this article we propose a data treatment strategy to generate new discriminative features, called compound-features (or c-features), for the sake of text classification. These c-features …
AK Uysal, S Gunal - Information processing & management, 2014 - Elsevier
Preprocessing is one of the key components in a typical text classification framework. This paper aims to extensively examine the impact of preprocessing on text classification in terms …
With the advent of the modern pre-trained Transformers, the text preprocessing has started to be neglected and not specifically addressed in recent NLP literature. However, both from …
Text Classification pipelines are a sequence of tasks needed to be performed to classify documents into a set of predefined categories. The pre-processing phase (before training) of …
F Li, Y Yang - Proceedings of the 20th international conference on …, 2003 - cdn.aaai.org
This paper presents a formal analysis of popular text classification methods, focusing on their loss functions whose minimization is essential to the optimization of those methods …
Most learning algorithms that are applied to text categorization problems rely on a bag-of- words document representation, ie, each word occurring in the document is considered as a …
Traditionally, text classifiers are built from labeled training examples. Labeling is usually done manually by human experts (or the users), which is a labor intensive and time …
D Yan, K Li, S Gu, L Yang - IEEE Access, 2020 - ieeexplore.ieee.org
The rapidly developing internet and other media have produced a tremendous amount of text data, making it a challenging and valuable task to find a more effective way to analyze …