[PDF][PDF] Comparing corpora using frequency profiling

P Rayson, R Garside - The workshop on comparing corpora, 2000 - aclanthology.org
This paper describes a method of comparing corpora which uses frequency profiling. The
method can be used to discover key words in the corpora which differentiate one corpus …

[PDF][PDF] Text genre detection using common word frequencies

E Stamatatos, N Fakotakis… - COLING 2000 Volume 2 …, 2000 - aclanthology.org
In this paper we present a method for detecting the text genre quickly and easily following an
approach originally proposed in authorship attribution studies which uses as style markers …

[PDF][PDF] Generalizing automatically generated selectional patterns

R Grishman, J Sterling - COLING 1994 Volume 2: The 15th …, 1994 - aclanthology.org
Frequency information on co-occurrence patterns can be atttomatically collected from a
syntactically analyzed corpus; this information can then serve as the basis for selectional …

[PDF][PDF] Feature selection and feature extraction for text categorization

DD Lewis - Speech and Natural Language: Proceedings of a …, 1992 - aclanthology.org
The effect of selecting varying numbers and kinds of features for use in predicting category
membership was investigated on the Reuters and MUC-3 text categorization data sets …

[PDF][PDF] Using word frequency lists to measure corpus homogeneity and similarity between corpora

A Kilgarriff - Fifth Workshop on Very Large Corpora, 1997 - aclanthology.org
How similar are two corpora? A measure of corpus similarity would be very useful for
lexicography and language engineering. Word frequency lists are cheap and easy to …

[PDF][PDF] Simple maths for keywords

A Kilgarriff - Proceedings of the corpus linguistics conference, 2009 - sketchengine.co.uk
We present a simple method for identifying keywords of one corpus vs. another. There is no
one-sizefits-all list, but different lists according to the frequency range the user is interested …

[PDF][PDF] N-gram-based text categorization

WB Cavnar, JM Trenkle - Proceedings of SDAIR-94, 3rd …, 1994 - dsacl3-2019.github.io
Text categorization is a fundamental task in document processing, allowing the automated
handling of enormous streams of documents in electronic form. One difficulty in handling …

Other than counting words: A linguistic approach to content analysis

CW Roberts - Social Forces, 1989 - academic.oup.com
A linguistic technique for the content analysis of texts and transcripts is described and
illustrated. The technique produces a quantitative description of texts that represents both …

[PDF][PDF] Identifying terms by their family and friends

D Maynard, S Ananiadou - COLING 2000 Volume 1: The 18th …, 2000 - aclanthology.org
Multi-word terms are traditionally identified using statistical techniques or, more recently,
using hybrid techniques combining statistics with shallow linguistic information. Approaches …

Comparing corpora and identifying key words, collocations, and frequency distributions through the WordSmith Tools suite of computer programs

M Scott - Small corpus studies and ELT, 2001 - torrossa.com
This chapter describes methods of carrying out research into small corpora, a method within
reach of the language student, teacher or analyst working at home with a standard personal …