A comprehensive overview of large language models

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

A survey on text classification algorithms: From text to predictions

A Gasparetto, M Marcuzzo, A Zangari, A Albarelli - Information, 2022 - mdpi.com
In recent years, the exponential growth of digital documents has been met by rapid progress
in text classification techniques. Newly proposed machine learning algorithms leverage the …

Language model tokenizers introduce unfairness between languages

A Petrov, E La Malfa, P Torr… - Advances in Neural …, 2024 - proceedings.neurips.cc
Recent language models have shown impressive multilingual performance, even when not
explicitly trained for it. Despite this, there are concerns about the quality of their outputs …

Latent guard: a safety framework for text-to-image generation

R Liu, A Khakzar, J Gu, Q Chen, P Torr… - European Conference on …, 2025 - Springer
With the ability to generate high-quality images, text-to-image (T2I) models can be exploited
for creating inappropriate content. To prevent misuse, existing safety measures are either …

Clippo: Image-and-language understanding from pixels only

M Tschannen, B Mustafa… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Multimodal models are becoming increasingly effective, in part due to unified components,
such as the Transformer architecture. However, multimodal models still often consist of many …

Sparks of large audio models: A survey and outlook

S Latif, M Shoukat, F Shamshad, M Usama… - arXiv preprint arXiv …, 2023 - arxiv.org
This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

Linguistically inspired roadmap for building biologically reliable protein language models

MH Vu, R Akbar, PA Robert, B Swiatczak… - Nature Machine …, 2023 - nature.com
Deep neural-network-based language models (LMs) are increasingly applied to large-scale
protein sequence data to predict protein function. However, being largely black-box models …

mGPT: Few-Shot Learners Go Multilingual

O Shliazhko, A Fenogenova, M Tikhonova… - Transactions of the …, 2024 - direct.mit.edu
This paper introduces mGPT, a multilingual variant of GPT-3, pretrained on 61 languages
from 25 linguistically diverse language families using Wikipedia and the C4 Corpus. We …

Systematic mappings of sound to meaning: A theoretical review

DA Haslett, ZG Cai - Psychonomic Bulletin & Review, 2024 - Springer
The form of a word sometimes conveys semantic information. For example, the iconic word
gurgle sounds like what it means, and busy is easy to identify as an English adjective …

Transformers for molecular property prediction: Lessons learned from the past five years

A Sultan, J Sieg, M Mathea… - Journal of Chemical …, 2024 - ACS Publications
Molecular Property Prediction (MPP) is vital for drug discovery, crop protection, and
environmental science. Over the last decades, diverse computational techniques have been …