Y Liu, J Cao, C Liu, K Ding, L Jin - arXiv preprint arXiv:2402.18041, 2024 - arxiv.org
This paper embarks on an exploration into the Large Language Model (LLM) datasets, which play a crucial role in the remarkable advancements of LLMs. The datasets serve as …
Language models have become a key step to achieve state-of-the art results in many different Natural Language Processing (NLP) tasks. Leveraging the huge amount of …
The evaluation of large language models (LLMs) is crucial to assess their performance and mitigate potential security risks. In this paper, we introduce PromptBench, a unified library to …
In recent years, computer language area has witnessed important evolvement with applications in different domains. Machine Translation MT technology, considered as a …
We present Unicoder, a universal language encoder that is insensitive to different languages. Given an arbitrary NLP task, a model can be trained with Unicoder using training …
M Ziemski, M Junczys-Dowmunt… - Proceedings of the …, 2016 - aclanthology.org
This paper describes the creation process and statistics of the official United Nations Parallel Corpus, the first parallel corpus composed from United Nations documents published by the …
In this paper, we present Farasa, a fast and accurate Arabic segmenter. Our approach is based on SVM-rank using linear kernels. We measure the performance of the segmenter in …
Abstract Word Sense Disambiguation is a longstanding task in Natural Language Processing, lying at the core of human language understanding. However, the evaluation of …
B Thompson, M Post - arXiv preprint arXiv:2004.14564, 2020 - arxiv.org
We frame the task of machine translation evaluation as one of scoring machine translation output with a sequence-to-sequence paraphraser, conditioned on a human reference. We …