A survey of text representation and embedding techniques in nlp

R Patil, S Boit, V Gudivada, J Nandigam - IEEE Access, 2023 - ieeexplore.ieee.org
Natural Language Processing (NLP) is a research field where a language in consideration
is processed to understand its syntactic, semantic, and sentimental aspects. The …

A similarity measure for text classification and clustering

YS Lin, JY Jiang, SJ Lee - IEEE transactions on knowledge and …, 2013 - ieeexplore.ieee.org
Measuring the similarity between documents is an important operation in the text processing
field. In this paper, a new similarity measure is proposed. To compute the similarity between …

[PDF][PDF] Comparing and combining dimension reduction techniques for efficient text clustering

B Tang, M Shepherd, E Milios… - Proceeding of SIAM …, 2005 - researchgate.net
A great challenge of text mining arises from the increasingly large text datasets and the high
dimensionality associated with natural language. In this research, a systematic study is …

Low-complexity quantization of discrete memoryless channels

JA Zhang, BM Kurkoski - 2016 International Symposium on …, 2016 - ieeexplore.ieee.org
A quantizer design algorithm for discrete memory-less channels with non-binary inputs is
given, when the objective is to maximize the mutual information between the channel input …

A niching memetic algorithm for simultaneous clustering and feature selection

W Sheng, X Liu, M Fairhurst - IEEE Transactions on Knowledge …, 2008 - ieeexplore.ieee.org
Clustering is inherently a difficult task, and is made even more difficult when the selection of
relevant features is also an issue. In this paper we propose an approach for simultaneous …

Document clustering using character N-grams: a comparative evaluation with term-based and word-based clustering

Y Miao, V Kešelj, E Milios - Proceedings of the 14th ACM international …, 2005 - dl.acm.org
We propose a novel method for document clustering using character N-grams. In the
traditional vector-space model, the documents are represented as vectors, in which each …

Combining semantic and term frequency similarities for text clustering

VHA Soares, RJGB Campello… - … and Information Systems, 2019 - Springer
A key challenge for document clustering consists in finding a proper similarity measure for
text documents that enables the generation of cohesive groups. Measures based on the …

A statistical model of cluster stability

Z Volkovich, Z Barzily, L Morozensky - Pattern Recognition, 2008 - Elsevier
In the current paper we present a method for assessing cluster stability. This method,
combined with a clustering algorithm, yields an estimate of the data partition, namely, the …

The method of N-grams in large-scale clustering of DNA texts

Z Volkovich, V Kirzhner, A Bolshoy, E Nevo, A Korol - Pattern recognition, 2005 - Elsevier
This paper is devoted to the techniques of clustering of texts based on the comparison of
vocabularies of N-grams. In contrast to the regular N-grams approach, the proposed N …

A visual approach for interactive keyterm-based clustering

S Nourashrafeddin, E Sherkat, R Minghim… - ACM Transactions on …, 2018 - dl.acm.org
The keyterm-based approach is arguably intuitive for users to direct text-clustering
processes and adapt results to various applications in text analysis. Its way of markedly …