A brief survey of text mining: Classification, clustering and extraction techniques

M Allahyari, S Pouriyeh, M Assefi, S Safaei… - arXiv preprint arXiv …, 2017 - arxiv.org
The amount of text that is generated every day is increasing dramatically. This tremendous
volume of mostly unstructured text cannot be simply processed and perceived by computers …

Information retrieval and text mining technologies for chemistry

M Krallinger, O Rabal, A Lourenco, J Oyarzabal… - Chemical …, 2017 - ACS Publications
Efficient access to chemical information contained in scientific literature, patents, technical
reports, or the web is a pressing need shared by researchers and patent attorneys from …

Sensecape: Enabling multilevel exploration and sensemaking with large language models

S Suh, B Min, S Palani, H Xia - Proceedings of the 36th Annual ACM …, 2023 - dl.acm.org
People are increasingly turning to large language models (LLMs) for complex information
tasks like academic research or planning a move to another city. However, while they often …

[图书][B] Machine learning for text: An introduction

CC Aggarwal, CC Aggarwal - 2018 - Springer
The extraction of useful insights from text with various types of statistical algorithms is
referred to as text mining, text analytics, or machine learning from text. The choice of …

[图书][B] Data mining: the textbook

CC Aggarwal - 2015 - Springer
This textbook explores the different aspects of data mining from the fundamentals to the
complex data types and their applications, capturing the wide diversity of problem domains …

[图书][B] Predictive analytics and data mining: concepts and practice with rapidminer

V Kotu, B Deshpande - 2014 - books.google.com
Put Predictive Analytics into ActionLearn the basics of Predictive Analysis and Data Mining
through an easy to understand conceptual framework and immediately practice the concepts …

Mining heterogeneous information networks: a structural analysis approach

Y Sun, J Han - ACM SIGKDD explorations newsletter, 2013 - dl.acm.org
Most objects and data in the real world are of multiple types, interconnected, forming
complex, heterogeneous but often semi-structured information networks. However, most …

Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences

JR Rideout, Y He, JA Navas-Molina, WA Walters… - PeerJ, 2014 - peerj.com
We present a performance-optimized algorithm, subsampled open-reference OTU picking,
for assigning marker gene (eg, 16S rRNA) sequences generated on next-generation …

A survey of text clustering algorithms

CC Aggarwal, CX Zhai - Mining text data, 2012 - Springer
Clustering is a widely studied data mining problem in the text domains. The problem finds
numerous applications in customer segmentation, classification, collaborative filtering …

From frequency to meaning: Vector space models of semantics

PD Turney, P Pantel - Journal of artificial intelligence research, 2010 - jair.org
Computers understand very little of the meaning of human language. This profoundly limits
our ability to give instructions to computers, the ability of computers to explain their actions to …