[PDF][PDF] Statistical comparisons of classifiers over multiple data sets

J Demšar - The Journal of Machine learning research, 2006 - jmlr.org
While methods for comparing two learning algorithms on a single data set have been
scrutinized for quite some time already, the issue of statistical tests for comparisons of more …

Chatbot arena: An open platform for evaluating llms by human preference

WL Chiang, L Zheng, Y Sheng… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have unlocked new capabilities and applications; however,
evaluating the alignment with human preferences still poses significant challenges. To …

Specializing smaller language models towards multi-step reasoning

Y Fu, H Peng, L Ou, A Sabharwal… - … on Machine Learning, 2023 - proceedings.mlr.press
The surprising ability of Large Language Models (LLMs) to perform well on complex
reasoning with only few-shot chain-of-thought prompts is believed to emerge only in very …

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu… - International …, 2024 - proceedings.mlr.press
Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …

[HTML][HTML] GPT understands, too

X Liu, Y Zheng, Z Du, M Ding, Y Qian, Z Yang, J Tang - AI Open, 2024 - Elsevier
Prompting a pretrained language model with natural language patterns has been proved
effective for natural language understanding (NLU). However, our preliminary study reveals …

Contrastive energy prediction for exact energy-guided diffusion sampling in offline reinforcement learning

C Lu, H Chen, J Chen, H Su, C Li… - … on Machine Learning, 2023 - proceedings.mlr.press
Guided sampling is a vital approach for applying diffusion models in real-world tasks that
embeds human-defined guidance during the sampling procedure. This paper considers a …

Sharp-maml: Sharpness-aware model-agnostic meta learning

M Abbas, Q Xiao, L Chen, PY Chen… - … on machine learning, 2022 - proceedings.mlr.press
Abstract Model-agnostic meta learning (MAML) is currently one of the dominating
approaches for few-shot meta-learning. Albeit its effectiveness, the optimization of MAML …

Confidence score for source-free unsupervised domain adaptation

J Lee, D Jung, J Yim, S Yoon - International conference on …, 2022 - proceedings.mlr.press
Source-free unsupervised domain adaptation (SFUDA) aims to obtain high performance in
the unlabeled target domain using the pre-trained source model, not the source data …

Codereval: A benchmark of pragmatic code generation with generative pre-trained models

H Yu, B Shen, D Ran, J Zhang, Q Zhang, Y Ma… - Proceedings of the 46th …, 2024 - dl.acm.org
Code generation models based on the pre-training and fine-tuning paradigm have been
increasingly attempted by both academia and industry, resulting in well-known industrial …

Intelligent data analysis approaches to churn as a business problem: a survey

DL García, À Nebot, A Vellido - Knowledge and Information Systems, 2017 - Springer
Globalization processes and market deregulation policies are rapidly changing the
competitive environments of many economic sectors. The appearance of new competitors …