- 学术资源搜索

Astra-sim2. 0: Modeling hierarchical networks and disaggregated systems for large-model training at scale

W Won, T Heo, S Rashidi, S Sridharan… - … Analysis of Systems …, 2023 - ieeexplore.ieee.org

As deep learning models and input data continue to scale at an unprecedented rate, it has
become inevitable to move towards distributed training platforms to fit the models and …

被引用次数：25 相关文章所有 4 个版本

Fault and self-repair for high reliability in die-to-die interconnection of 2.5 D/3D IC

R Song, J Zhang, Z Zhu, G Shan, Y Yang - Microelectronics Reliability, 2024 - Elsevier

Bringing dies closer by die-to-die interconnection is a way that reduces latency and energy
per bit transmitted, while increasing bandwidth per mm of chip. Heterogeneous integration …

被引用次数：1 相关文章

[PDF] acm.org

Heterogeneous Die-to-Die Interfaces: Enabling More Flexible Chiplet Interconnection Systems

Y Feng, D Xiang, K Ma - Proceedings of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org

The chiplet architecture is one of the emerging methodologies and is believed to be scalable
and economical. However, most current multi-chiplet systems are based on one uniform die …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

MAD-Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems

S Hsia, A Golden, B Acun, N Ardalani… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org

Training and deploying large-scale machine learning models is time-consuming, requires
significant distributed computing infrastructures, and incurs high operational costs. Our …

Leveraging Memory Expansion to Accelerate Large-Scale DL Training

D Kadiyala, S Rashidi, T Heo… - … Analysis of Systems …, 2024 - ieeexplore.ieee.org

Modern Deep Learning (DL) models require massive clusters of specialized, high-end
nodes to train. Designing such clusters to maximize both performance and utilization is a …

高级搜索

QQ 群

Astra-sim2. 0: Modeling hierarchical networks and disaggregated systems for large-model training at scale

Fault and self-repair for high reliability in die-to-die interconnection of 2.5 D/3D IC

Heterogeneous Die-to-Die Interfaces: Enabling More Flexible Chiplet Interconnection Systems

MAD-Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems

Leveraging Memory Expansion to Accelerate Large-Scale DL Training

引用