LLMCompass: Enabling Efficient Hardware Design for Large Language Model Inference

H Zhang, A Ning, RB Prabhakar… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
The past year has witnessed the increasing popularity of Large Language Models (LLMs).
Their unprecedented scale and associated high hardware cost have impeded their broader …

Chakra: Advancing performance benchmarking and co-design using standardized execution traces

S Sridharan, T Heo, L Feng, Z Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Benchmarking and co-design are essential for driving optimizations and innovation around
ML models, ML software, and next-generation hardware. Full workload benchmarks, eg …

Wafer-Scale Computing: Advancements, Challenges, and Future Perspectives [Feature]

Y Hu, X Lin, H Wang, Z He, X Yu… - IEEE Circuits and …, 2024 - ieeexplore.ieee.org
Nowadays, artificial intelligence (AI) technology with large models plays an increasingly
important role in both academia and industry. It also brings a rapidly increasing demand for …

LIBRA: Enabling Workload-Aware Multi-Dimensional Network Topology Optimization for Distributed Training of Large AI Models

W Won, S Rashidi, S Srinivasan… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
As model sizes in machine learning continue to scale, distributed training is necessary to
accommodate model weights within each device and to reduce training time. However, this …

DeepFlow: A cross-stack pathfinding framework for distributed ai systems

N Ardalani, S Pal, P Gupta - ACM Transactions on Design Automation of …, 2024 - dl.acm.org
Over the past decade, machine learning model complexity has grown at an extraordinary
rate, as has the scale of the systems training such large models. However, there is an …

MAD-Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems

S Hsia, A Golden, B Acun, N Ardalani… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Training and deploying large-scale machine learning models is time-consuming, requires
significant distributed computing infrastructures, and incurs high operational costs. Our …

Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's GPT-4 with Self-Hosted Open Source SLMs in Production

C Irugalbandara, A Mahendra, R Daynauth… - arXiv preprint arXiv …, 2023 - arxiv.org
Many companies rely on APIs of managed AI models such as OpenAI's GPT-4 to create AI-
enabled experiences in their products. Along with the benefits of ease of use and shortened …

The Case For Data Centre Hyperloops

G López-Paradís, IM Hair, S Kannan… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Data movement is a hot-button topic today, with workloads like machine learning (ML)
training, graph processing, and data analytics consuming datasets as large as 30PB. Such a …

TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Machine Learning

W Won, M Elavazhagan, S Srinivasan, A Durg… - arXiv preprint arXiv …, 2023 - arxiv.org
The surge of artificial intelligence, specifically large language models, has led to a rapid
advent towards the development of large-scale machine learning training clusters …

Demystifying Platform Requirements for Diverse LLM Inference Use Cases

A Bambhaniya, R Raj, G Jeong, S Kundu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have shown remarkable performance across a wide range
of applications, often outperforming human experts. However, deploying these parameter …