Multilingual statistical text analysis, Zipf's law and Hungarian speech generation

G Németh, C Zainkó - Acta Linguistica Hungarica, 2002 - akjournals.com
The practical challenge of creating a Hungarian e-mail reader has initiated our work on
statistical text analysis. The starting point was statistical analysis for automatic discrimination …

Finite state methods for hyphenation

G Bouma - Natural Language Engineering, 2003 - cambridge.org
Finite state methods for hyphenation Page 1 Natural Language Engineering 9 (1): 5–20. c 2003
Cambridge University Press DOI: 10.1017/S1351324903003073 Printed in the United Kingdom …

[PDF][PDF] Towards Universal Hyphenation Patterns.

P Sojka, O Sojka - RASLAN, 2019 - nlp.fi.muni.cz
Hyphenation is at the core of every document preparation system, being that typesetting
system such as TEX or modern web browser. For every language, there have to be …

A comparison of data-driven automatic syllabification methods

CR Adsett, Y Marchand - … on String Processing and Information Retrieval, 2009 - Springer
Although automatic syllabification is an important component in several natural language
tasks, little has been done to compare the results of data-driven methods on a wide range of …

The unreasonable effectiveness of pattern generation

P Sojka, O Sojka - Zpravodaj Československého sdružení uživatelů TeXu, 2019 - dml.cz
Languages are constantly evolving, and so are their hyphenation rules and needs. The
effectiveness and utility of TeX's hyphenation have been proven by its usage in almost all …

[PDF][PDF] Hyphenation on demand

P Sojka - TUGboat, 1999 - researchgate.net
The need to fully automate the batch typesetting process increases with the use of TEX as
the engine for high-volume and on-the-fly typeset documents which, in turn, leads to the …

An algorithm for prefix based ranked autocomplete

D Matani - arXiv preprint arXiv:2110.15535, 2021 - arxiv.org
Many search engines such as Google, Bing & Yahoo! show search suggestions when users
enter search phrases on their interfaces. These suggestions are meant to assist the user in …

Competing patterns for language engineering: Methods to handle and store empirical data

P Sojka - International Workshop on Text, Speech and Dialogue, 2000 - Springer
In this paper we describe a method of effective handling of linguistic data by means of
covering and inhibiting patterns-patterns that “compete” each other. A methodology of …

[PDF][PDF] Context sensitive pattern based segmentation: A Thai challenge

P Sojka, D Antoš - Proceedings of EACL 2003 Workshop on …, 2003 - researchgate.net
A Thai written text is a string of symbols without explicit word boundary markup. A method for
a development of a segmentation tool from a corpus of already segmented text is described …

[PDF][PDF] Automatic non-standard hyphenation in OpenOffice. org

L Németh - TUGboat, 2006 - Citeseer
The hyphenation algorithm of OpenOffice. org 2.0. 2 is a generalization of TEX's
hyphenation algorithm that allows automatic non-standard hyphenation by competing …