Vector space representations of documents in classifying finnish social media texts

V Venekoski, S Puuska, J Vankka - … , October 13-15, 2016, Proceedings 22, 2016 - Springer
Computational analysis of linguistic data requires that texts are transformed into numeric
representations. The aim of this research is to evaluate different methods for building vector …

Multivariate gaussian document representation from word embeddings for text categorization

G Nikolentzos, P Meladianos, F Rousseau… - Proceedings of the …, 2017 - aclanthology.org
Recently, there has been a lot of activity in learning distributed representations of words in
vector spaces. Although there are models capable of learning high-quality distributed …

[PDF][PDF] Explorations of morphological structure in distributional space

H Baayen, D Brown, YY Chuang - The Mental Lexicon, 2023 - pure.york.ac.uk
This special issue brings together five studies that are the fruit of intense interactions
between two research projects: The 'Feast and Famine'project funded by the UK's Arts and …

Short-text representation using diffusion wavelets

V Jain, J Mahadeokar - … of the 23rd International Conference on World …, 2014 - dl.acm.org
Usual text document representations such as tf-idf do not work well in classification tasks for
short-text documents and across diverse data domains. Optimizing different representations …

Clustering comparable corpora of Russian and Ukrainian academic texts: Word embeddings and semantic fingerprints

A Kutuzov, M Kopotev, T Sviridenko… - arXiv preprint arXiv …, 2016 - arxiv.org
We present our experience in applying distributional semantics (neural word embeddings)
to the problem of representing and clustering documents in a bilingual comparable corpus …

From vector space models to vector space models of semantics

HB Barathi Ganesh, M Anand Kumar… - Text Processing: FIRE …, 2018 - Springer
This paper assesses the performance of frequency and concept based text representation in
Mixed Script Information Retrieval and Classification tasks. In text analytics, representation …

Reduction of dimensionality of feature vectors in subject classification of text documents

T Walkowiak, S Datko, H Maciejewski - … , RelStat'18, 17-20 October 2018 …, 2019 - Springer
Within a paper we investigate the influence of dimensionality reduction of feature vector
(PCA and random projection) on the results of subject classification of text documents in …

The analysis of text categorization represented with word embeddings using homogeneous classifiers

ZH Kilimci, S Akyokuş - 2019 IEEE International Symposium on …, 2019 - ieeexplore.ieee.org
Text data mining is the process of extracting and analyzing valuable information from text. A
text data mining process generally consists of lexical and syntax analysis of input text data …

MimicProp: Learning to incorporate lexicon knowledge into distributed word representation for social media analysis

M Yan, YR Lin, R Hwa, AM Ertugrul, M Guo… - Proceedings of the …, 2020 - ojs.aaai.org
Lexicon-based methods and word embeddings are the two widely used approaches for
analyzing texts in social media. The choice of an approach can have a significant impact on …

Derivation of document vectors from adaptation of lstm language model

W Li, B Mak - Proceedings of the 15th Conference of the …, 2017 - aclanthology.org
In many natural language processing (NLP) tasks, a document is commonly modeled as a
bag of words using the term frequency-inverse document frequency (TF-IDF) vector. One …