Languages are known to describe the world in diverse ways. Across lexicons, diversity is pervasive, appearing through phenomena such as lexical gaps and untranslatability …
Large-scale morphological databases provide essential input to a wide range of NLP applications. Inflectional data is of particular importance for morphologically rich …
According to the" human in the loop" paradigm, machine learning algorithms can improve when leveraging on human intelligence, usually in the form of labels or annotation from …
We present CogNet, a large-scale, automatically-built database of sense-tagged cognates— words of common origin and meaning across languages. CogNet is continuously evolving …
In today's multilingual lexical databases, the majority of the world's languages are under- represented. Beyond a mere issue of resource incompleteness, we show that existing lexical …
This paper introduces CogNet, a new, large-scale lexical database that provides cognates— words of common origin and meaning—across languages. The database currently contains …
We present a large scale multilingual lexical resource, the Universal Knowledge Core (UKC), which is organized like a Wordnet with, however, a major design difference. In the …
It is well known that AI-based language technology--large language models, machine translation systems, multilingual dictionaries, and corpora--is currently limited to 2 to 3 …
Semantic Heterogeneity is the problem which arises when multiple resources present differences in how they represent the same real world phenomenon. In KR, an early …