Multiun: A multilingual corpus from united nation documents.

K Zhu, J Wang, J Zhou, Z Wang, H Chen… - arXiv e …, 2023 - ui.adsabs.harvard.edu

The increasing reliance on Large Language Models (LLMs) across academia and industry
necessitates a comprehensive understanding of their robustness to prompts. In response to …

被引用次数：239 相关文章所有 2 个版本

[PDF] arxiv.org

Datasets for large language models: A comprehensive survey

Y Liu, J Cao, C Liu, K Ding, L Jin - arXiv preprint arXiv:2402.18041, 2024 - arxiv.org

This paper embarks on an exploration into the Large Language Model (LLM) datasets,
which play a crucial role in the remarkable advancements of LLMs. The datasets serve as …

被引用次数：59 相关文章所有 4 个版本

[PDF] arxiv.org

Flaubert: Unsupervised language model pre-training for french

H Le, L Vial, J Frej, V Segonne, M Coavoux… - arXiv preprint arXiv …, 2019 - arxiv.org

Language models have become a key step to achieve state-of-the art results in many
different Natural Language Processing (NLP) tasks. Leveraging the huge amount of …

被引用次数：559 相关文章所有 8 个版本

[PDF] jmlr.org

Promptbench: A unified library for evaluation of large language models

K Zhu, Q Zhao, H Chen, J Wang, X Xie - Journal of Machine Learning …, 2024 - jmlr.org

The evaluation of large language models (LLMs) is crucial to assess their performance and
mitigate potential security risks. In this paper, we introduce PromptBench, a unified library to …

被引用次数：28 相关文章所有 2 个版本

[PDF] ieee.org

Arabic machine translation: A survey with challenges and future directions

J Zakraoui, M Saleh, S Al-Maadeed, JM Alja'am - IEEE Access, 2021 - ieeexplore.ieee.org

In recent years, computer language area has witnessed important evolvement with
applications in different domains. Machine Translation MT technology, considered as a …

被引用次数：36 相关文章所有 5 个版本

[PDF] aclanthology.org

Unicoder: A universal language encoder by pre-training with multiple cross-lingual tasks

H Huang, Y Liang, N Duan, M Gong, L Shou… - arXiv preprint arXiv …, 2019 - arxiv.org

We present Unicoder, a universal language encoder that is insensitive to different
languages. Given an arbitrary NLP task, a model can be trained with Unicoder using training …

被引用次数：236 相关文章所有 4 个版本

[PDF] aclanthology.org

The united nations parallel corpus v1. 0

M Ziemski, M Junczys-Dowmunt… - Proceedings of the …, 2016 - aclanthology.org

This paper describes the creation process and statistics of the official United Nations Parallel
Corpus, the first parallel corpus composed from United Nations documents published by the …

被引用次数：529 相关文章所有 7 个版本

[PDF] aclanthology.org

[PDF][PDF] Farasa: A fast and furious segmenter for arabic

A Abdelali, K Darwish, N Durrani… - Proceedings of the 2016 …, 2016 - aclanthology.org

In this paper, we present Farasa, a fast and accurate Arabic segmenter. Our approach is
based on SVM-rank using linear kernels. We measure the performance of the segmenter in …

被引用次数：492 相关文章所有 3 个版本

[PDF] uniroma1.it

Word sense disambiguation: a uinified evaluation framework and empirical comparison

A Raganato, J Camacho-Collados… - Proceedings of the 15th …, 2017 - iris.uniroma1.it

Abstract Word Sense Disambiguation is a longstanding task in Natural Language
Processing, lying at the core of human language understanding. However, the evaluation of …

被引用次数：419 相关文章所有 19 个版本

[PDF] arxiv.org

Automatic machine translation evaluation in many languages via zero-shot paraphrasing

B Thompson, M Post - arXiv preprint arXiv:2004.14564, 2020 - arxiv.org

We frame the task of machine translation evaluation as one of scoring machine translation
output with a sequence-to-sequence paraphraser, conditioned on a human reference. We …

被引用次数：193 相关文章所有 5 个版本

高级搜索

QQ 群