SCDV: Sparse Composite Document Vectors using soft clustering over distributional representations

D Mekala, V Gupta, B Paranjape, H Karnick - arXiv preprint arXiv …, 2016 - arxiv.org
We present a feature vector formation technique for documents-Sparse Composite
Document Vector (SCDV)-which overcomes several shortcomings of the current …

Improving document classification with multi-sense embeddings

V Gupta, A Kumar, P Nokhiz, H Gupta, P Talukdar - ECAI 2020, 2020 - ebooks.iospress.nl
Efficient representation of text documents is an important building block in many NLP tasks.
Research on long text categorization has shown that simple weighted averaging of word …

Vector of locally-aggregated word embeddings (VLAWE): A novel document-level representation

RT Ionescu, AM Butnaru - arXiv preprint arXiv:1902.08850, 2019 - arxiv.org
In this paper, we propose a novel representation for text documents based on aggregating
word embedding vectors into document embeddings. Our approach is inspired by the Vector …

Efficient vector representation for documents through corruption

M Chen - arXiv preprint arXiv:1707.02377, 2017 - arxiv.org
We present an efficient document representation learning framework, Document Vector
through Corruption (Doc2VecC). Doc2VecC represents each document as a simple average …

Multivariate gaussian document representation from word embeddings for text categorization

G Nikolentzos, P Meladianos, F Rousseau… - Proceedings of the …, 2017 - aclanthology.org
Recently, there has been a lot of activity in learning distributed representations of words in
vector spaces. Although there are models capable of learning high-quality distributed …

Document embedding with paragraph vectors

AM Dai, C Olah, QV Le - arXiv preprint arXiv:1507.07998, 2015 - arxiv.org
Paragraph Vectors has been recently proposed as an unsupervised method for learning
distributed representations for pieces of texts. In their work, the authors showed that the …

Improving a tf-idf weighted document vector embedding

CW Schmidt - arXiv preprint arXiv:1902.09875, 2019 - arxiv.org
We examine a number of methods to compute a dense vector embedding for a document in
a corpus, given a set of word vectors such as those from word2vec or GloVe. We describe …

Word embeddings for natural language processing

RP Lebret - 2016 - infoscience.epfl.ch
Word embedding is a feature learning technique which aims at mapping words from a
vocabulary into vectors of real numbers in a low-dimensional space. By leveraging large …

[PDF][PDF] Paragraph Vector Representation Based on Word to Vector and CNN Learning.

Z Xiong, Q Shen, Y Wang, C Zhu - Computers, Materials & …, 2018 - cdn.techscience.cn
Document processing in natural language includes retrieval, sentiment analysis, theme
extraction, etc. Classical methods for handling these tasks are based on models of …

[HTML][HTML] Vector representation based on a supervised codebook for Nepali documents classification

C Sitaula, A Basnet, S Aryal - PeerJ Computer Science, 2021 - peerj.com
Document representation with outlier tokens exacerbates the classification performance due
to the uncertain orientation of such tokens. Most existing document representation methods …