Highly effective Arabic diacritization using sequence to sequence modeling- 学术资源搜索

Highly effective Arabic diacritization using sequence to sequence modeling

H Mubarak, A Abdelali, H Sajjad, Y Samih… - Proceedings of the …, 2019 - aclanthology.org

H Mubarak, A Abdelali, H Sajjad, Y Samih, K Darwish

Proceedings of the 2019 Conference of the North American Chapter of …, 2019•aclanthology.org

Abstract

Arabic text is typically written without short vowels (or diacritics). However, their presence is required for properly verbalizing Arabic and is hence essential for applications such as text to speech. There are two types of diacritics, namely core-word diacritics and case-endings. Most previous works on automatic Arabic diacritic recovery rely on a large number of manually engineered features, particularly for case-endings. In this work, we present a unified character level sequence-to-sequence deep learning model that recovers both types of diacritics without the use of explicit feature engineering. Specifically, we employ a standard neural machine translation setup on overlapping windows of words (broken down into characters), and then we use voting to select the most likely diacritized form of a word. The proposed model outperforms all previous state-of-the-art systems. Our best settings achieve a word error rate (WER) of 4.49% compared to the state-of-the-art of 12.25% on a standard dataset.

aclanthology.org

展开收起

被引用次数：50 相关文章所有 4 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Highly effective Arabic diacritization using sequence to sequence modeling

引用