The multimedia community has shown a significant interest in perceiving and representing the physical world with multimodal pretrained neural network models, and among them, the …
Languages differ in how they divide up the world into concepts and words; eg, in contrast to English, Swahili has a single concept forbelly'andwomb'. We investigate these differences in …
D Wu, Y Chen, L Ding, D Tao - arXiv preprint arXiv:2104.06393, 2021 - arxiv.org
Spoken language understanding (SLU) system usually consists of various pipeline components, where each component heavily relies on the results of its upstream ones. For …
T Yu, X Liu, L Ding, K Chen, D Tao… - Proceedings of the 62nd …, 2024 - aclanthology.org
End-to-end speech translation (ST) presents notable disambiguation challenges as it necessitates simultaneous cross-modal and cross-lingual transformations. While word …
This paper presents EvAlign, a visual analytics framework for quantitative and qualitative evaluation of automatic translation alignment models. EvAlign offers various visualization …
H Lin, J Yang - IEEE Access, 2021 - ieeexplore.ieee.org
Single-image super resolution (SR) is used to reconstruct a high-resolution image with more high-frequency details based on a low-resolution image as input. In recent years, image SR …
Z Ma, J Ye, S Cheng - arXiv preprint arXiv:2308.02903, 2023 - arxiv.org
Cross-lingual adaptation has proven effective in spoken language understanding (SLU) systems with limited resources. Existing methods are frequently unsatisfactory for intent …
An ongoing challenge in current natural language processing is how its major advancements tend to disproportionately favor resource-rich languages, leaving a …
V Zouhar, D Pylypenko - arXiv preprint arXiv:2103.17250, 2021 - arxiv.org
The most common tools for word-alignment rely on a large amount of parallel sentences, which are then usually processed according to one of the IBM model algorithms. The …