H Chen, Y Chen, J Zhang - Applied Sciences, 2023 - mdpi.com
The development of neural machine translation has achieved a good translation effect on large-scale general corpora, but there are still many problems in the translation of low …
K Slagle - arXiv preprint arXiv:2404.14408, 2024 - arxiv.org
Tokenization is widely used in large language models because it significantly improves performance. However, tokenization imposes several disadvantages, such as performance …
L Huang, Y Feng - arXiv preprint arXiv:2405.19290, 2024 - arxiv.org
Subword tokenization is a common method for vocabulary building in Neural Machine Translation (NMT) models. However, increasingly complex tasks have revealed its …
In natural language processing, much current research focuses on training larger and larger models on more and more data. In this thesis, we argue that how data is represented can …
Abstract The rise of Artificial Intelligence technology has raised concerns about the potential compromise of privacy due to the handling of personal data. Private AI prevents cybercrimes …
NLP models have grown as a powerful technology and impact our social life like never before, along with rising concerns in practical applications including privacy invasion and …