Smaller, weaker, yet better: Training llm reasoners via compute-optimal sampling

H Bansal, A Hosseini, R Agarwal, VQ Tran… - arXiv preprint arXiv …, 2024 - arxiv.org
Training on high-quality synthetic data from strong language models (LMs) is a common
strategy to improve the reasoning performance of LMs. In this work, we revisit whether this …

Mm1. 5: Methods, analysis & insights from multimodal llm fine-tuning

H Zhang, M Gao, Z Gan, P Dufter, N Wenzel… - arXiv preprint arXiv …, 2024 - arxiv.org
We present MM1. 5, a new family of multimodal large language models (MLLMs) designed
to enhance capabilities in text-rich image understanding, visual referring and grounding …

Llm for mobile: An initial roadmap

D Chen, Y Liu, M Zhou, Y Zhao, H Wang… - ACM Transactions on …, 2024 - dl.acm.org
When mobile meets LLMs, mobile app users deserve to have more intelligent usage
experiences. For this to happen, we argue that there is a strong need to apply LLMs for the …

Cloud Platforms for Developing Generative AI Solutions: A Scoping Review of Tools and Services

D Patel, G Raut, SN Cheetirala, GN Nadkarni… - arXiv preprint arXiv …, 2024 - arxiv.org
Generative AI is transforming enterprise application development by enabling machines to
create content, code, and designs. These models, however, demand substantial …

Mobileviews: A large-scale mobile gui dataset

L Gao, L Zhang, S Wang, S Wang, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Mobile screen assistants help smartphone users by interpreting mobile screens and
responding to user requests. The excessive private information on mobile screens …

Power scheduler: A batch size and token number agnostic learning rate scheduler

Y Shen, M Stallone, M Mishra, G Zhang, S Tan… - arXiv preprint arXiv …, 2024 - arxiv.org
Finding the optimal learning rate for language model pretraining is a challenging task. This
is not only because there is a complicated correlation between learning rate, batch size …

Compress then serve: Serving thousands of lora adapters with little overhead

R Brüel-Gabrielsson, J Zhu, O Bhardwaj… - arXiv preprint arXiv …, 2024 - arxiv.org
Fine-tuning large language models (LLMs) with low-rank adaptations (LoRAs) has become
common practice, often yielding numerous copies of the same LLM differing only in their …

Instruction-Following Pruning for Large Language Models

B Hou, Q Chen, J Wang, G Yin, C Wang, N Du… - arXiv preprint arXiv …, 2025 - arxiv.org
With the rapid scaling of large language models (LLMs), structured pruning has become a
widely used technique to learn efficient, smaller models from larger ones, delivering superior …

Boosting Tool Use of Large Language Models via Iterative Reinforced Fine-Tuning

Y Zeng, X Ding, Y Wang, W Liu, W Ning, Y Hou… - arXiv preprint arXiv …, 2025 - arxiv.org
Augmenting large language models (LLMs) with external tools is a promising approach to
enhance their capabilities. Effectively leveraging this potential for complex tasks hinges …

[HTML][HTML] Sparse Convolution FPGA Accelerator Based on Multi-Bank Hash Selection

J Xu, H Pu, D Wang - Micromachines, 2024 - mdpi.com
Reconfigurable processor-based acceleration of deep convolutional neural network (DCNN)
algorithms has emerged as a widely adopted technique, with particular attention on sparse …