Handling the impact of low frequency events on co-occurrence based measures of word similarity-a case study of pointwise mutual information

F Role, M Nadif - … on Knowledge Discovery and Information Retrieval, 2011 - scitepress.org
Statistical measures of word similarity are widely used in many areas of information retrieval
and text mining. Among popular word co-occurrence based measures is Pointwise Mutual …

Beneath (or beyond) the surface: Discovering voice-leading patterns with skip-grams

DRW Sears, G Widmer - Journal of Mathematics and Music, 2021 - Taylor & Francis
Recurrent voice-leading patterns like the Mi-Re-Do compound cadence (MRDCC) rarely
appear on the musical surface in complex polyphonic textures, so finding these patterns …

Term extraction from sparse, ungrammatical domain-specific documents

A Ittoo, G Bouma - Expert Systems with Applications, 2013 - Elsevier
Existing term extraction systems have predominantly targeted large and well-written
document collections, which provide reliable statistical and linguistic evidence to support …

KR-WordRank: An unsupervised Korean word extraction method based on WordRank

H Kim, S Cho, P Kang - Journal of Korean Institute of Industrial …, 2014 - koreascience.kr
A Word is the smallest unit for text analysis, and the premise behind most text-mining
algorithms is that the words in given documents can be perfectly recognized. However, the …

[PDF][PDF] Finding multiwords of more than two words

A Kilgarriff, P Rychlý, V Kovář, V Baisa - Proceedings of the 15th …, 2012 - euralex.org
The prospects for automatically identifying two-word multiwords in corpora have been
explored in depth, and there are now well-established methods in widespread use.(We use …

CLAD: A corpus-derived Chinese lexical association database

SY Lin, HC Chen, TH Chang, WE Lee… - Behavior Research …, 2019 - Springer
The application of word associations has become increasingly widespread. However, the
association norms produced by traditional free association tests tend not to exceed 10,000 …

Multi-word terms selection for information retrieval

C Bechikh Ali, H Haddad, Y Slimani - Information Discovery and …, 2023 - emerald.com
Purpose A number of approaches and algorithms have been proposed over the years as a
basis for automatic indexing. Many of these approaches suffer from precision inefficiency at …

Measuring coselectional constraint in learner corpora: A graph-based approach

AV Shadrova - 2020 - edoc.hu-berlin.de
The thesis located in corpus linguistics analyzes the acquisition of coselectional constraint in
learners of German as a second language in a quasi-longitudinal design based on the …

TermeX: A Tool for Collocation Extraction

D Delač, Z Krleža, J Šnajder, B Dalbelo Bašić… - … and Intelligent Text …, 2009 - Springer
Collocations–word combinations occurring together more often than by chance–have a wide
range of NLP applications. Many approaches for automating collocation extraction based on …

Improving product quality and reliability with customer experience data

A Brombacher, E Hopma, A Ittoo, Y Lu… - Quality and …, 2012 - Wiley Online Library
Advance technology development and wide use of the World Wide Web have made it
possible for new product development organizations to access multi‐sources of data‐related …