Wikipedia-based hybrid document representation for textual news classification

MA Mouriño-García, R Perez-Rodriguez, L Anido-Rifon… - Soft Computing, 2018 - Springer
documents—the representations of documents we use, the selected classification algorithms,
and the description of the corpora… of documents and the two corpora employed in this work, …

[PDF][PDF] Bag-of-Concepts Document Representation for Textual News Classification.

M Mouriño-García, R Pérez-Rodríguez… - Int. J. Comput …, 2015 - gelbukh.com
… consists in the classification of the two corpora presented in Section 3.1 … documents
represented as bags-of-concepts. First, it is necessary to obtain the BoC representation of documents

Wikipedia-Based Hybrid Document Representation for Textual News Classification

MAM García, RP Rodríguez, MV Ferro… - 2016 3rd International …, 2016 - ieeexplore.ieee.org
… For the classification of the corpora we used several algorithms in order to observe which
one performs the best for each corpus: Linear Support Vector Machines [17], Random Forest […

Comparative document analysis for large text corpora

X Ren, Y Lv, K Wang, J Han - Proceedings of the tenth ACM international …, 2017 - dl.acm.org
news articles published at different dates shows the power of the proposed method on
comparing sets of documents… and is thus not scalable to large corpora. Our proposed CDA is fully …

Annotated news corpora and a lexicon for sentiment analysis in Slovene

J Bučar, M Žnidaršič, J Povh - Language Resources and Evaluation, 2018 - Springer
news corpora with sentiment annotation on three levels of granularity: sentence, paragraph
and document … The corpora contain more than 250,000 documents with political, business, …

A news classification applied with new text representation based on the improved LDA

D Shao, C Li, C Huang, Y Xiang, Z Yu - Multimedia tools and applications, 2022 - Springer
… For different corpora, the best model is related to the number of topics. Perplexity value can
… 18 and 20, when using LSTM networks for news classification, the “word-document” matrix …

A corpus for multilingual document classification in eight languages

H Schwenk, X Li - arXiv preprint arXiv:1805.09821, 2018 - arxiv.org
… with a collection of 487,000 news stories. Each news story was manually classified into four
… , we also provide training corpora of 2 000, 5 000 and 10 000 documents.The development …

Newsembed: Modeling news through pre-trained document representations

J Liu, T Liu, C Yu - Proceedings of the 27th ACM SIGKDD Conference …, 2021 - dl.acm.org
News Classification. In this experiment, we evaluate NewsEmbed in the setting of document
classification. … These include several semantic textual similarity corpora in the news domain, …

Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec

D Kim, D Seo, S Cho, P Kang - Information sciences, 2019 - Elsevier
document classification. In order to increase the variety of feature sets for classification,
we transform a document using three document representation methods: term frequency–inverse …

ANT corpus: An Arabic news text collection for textual classification

A Chouigui, OB Khiroun… - 2017 IEEE/ACS 14th …, 2017 - ieeexplore.ieee.org
corpora mainly extracted from news websites. These corpora are used to train and test
classification … Arabic corpus for text classification whose documents are automatically increasing. …