Cantonese natural language processing in the transformers era: a survey and current challenges

R Xiang, E Chersoni, Y Li, J Li, CR Huang… - Language Resources …, 2024 - Springer
Despite being spoken by a large population of speakers worldwide, Cantonese is under-
resourced in terms of the data scale and diversity compared to other major languages. This …

QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning

H Guo, F Xie, J Kang, Y Xiao, X Wu… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
This paper proposes a novel semi-supervised TTS framework, QS-TTS, to improve TTS
quality with lower supervised data requirements via Vector-Quantized Self-Supervised …

Lexical tonal effects in code-switching: A comparative study of Cantonese, Mandarin, and Vietnamese switching with English

KK Li, L Nguyen, C Bryant… - International Journal of …, 2024 - journals.sagepub.com
Aims and objectives: Previous research has revealed much about the syntactic and social
variables conditioning code-switching (ie, the alternation between two or more languages in …

Comparing and predicting eye-tracking data in Mandarin and Cantonese

J Li, B Peng, YY Hsu, E Chersoni - 2023 - ira.lib.polyu.edu.hk
Eye-tracking data in Chinese languages present unique challenges due to the non-
alphabetic and unspaced nature of the Chinese writing systems. This paper introduces the …

CantonMT: Cantonese-English neural machine translation looking into evaluations

KY Hong, L Han, RT Batista-Navarro… - Proceedings of the 16th …, 2024 - aclanthology.org
Cantonese-English is a low-resource language pair for machine translation (MT) studies,
despite the vast amount of English content publicly available online and the large amount of …

Exploring Parents' Utterances During Free Play in Hong Kong: The Associations with Language Complexity and Lexical Diversity

COY HUNG, D LIN, SW LO - Early Childhood Education Journal, 2025 - Springer
This study had the objective of exploring the parent–child interaction pattern during free
play. The current research examined parental utterances in terms of parents' linguistic …

How Well Do LLMs Handle Cantonese? Benchmarking Cantonese Capabilities of Large Language Models

J Jiang, P Chen, L Chen, S Wang, Q Bao… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid evolution of large language models (LLMs) has transformed the competitive
landscape in natural language processing (NLP), particularly for English and other data-rich …

[图书][B] Heritage Languages: Extending Variationist Approaches

N Nagy - 2024 - books.google.com
Heritage language speakers often feel discouraged from using their heritage language
because they are told they do not speak it well. This book offsets such views by investigating …

Using hidden Markov modelling to reveal in-session stages in text-based counselling

Z Fu, YC Hsu, CS Chan, J Liu, PSF Yip - npj Mental Health Research, 2024 - nature.com
Counselling sessions have multiple stages, each with its themes and objectives. This study
aimed to apply Hidden Markov Models (HMMs) to analyse counselling sessions from Open …

CantonMT: Investigating Back-Translation and Model-Switch Mechanisms for Cantonese-English Neural Machine Translation

K Yin Hong, L Han, R Batista-Navarro… - ACM Transactions on …, 2024 - dl.acm.org
This paper investigates the development and evaluation of machine translation models from
Cantonese to English (and backward), where we propose a novel approach to tackle low …