xCCL: A Survey of Industry-Led Collective Communication Libraries for Deep Learning

A Weingram, Y Li, H Qi, D Ng, L Dai, X Lu - Journal of Computer Science …, 2023 - Springer
Abstract Machine learning techniques have become ubiquitous both in industry and
academic applications. Increasing model sizes and training data volumes necessitate fast …

Accuracy-constrained efficiency optimization and GPU profiling of CNN inference for detecting drainage crossing locations

Y Zhang, D Pandey, D Wu, T Kundu, R Li… - Proceedings of the SC'23 …, 2023 - dl.acm.org
The accurate and efficient determination of hydrologic connectivity has garnered significant
attention from both academic and industrial sectors due to its critical implications for …

Pareto Optimization of CNN Models via Hardware-Aware Neural Architecture Search for Drainage Crossing Classification on Resource-Limited Devices

Y Li, J Baik, MM Rahman, I Anagnostopoulos… - Proceedings of the SC' …, 2023 - dl.acm.org
Embedded devices, constrained by limited memory and processors, require deep learning
models to be tailored to their specifications. This research explores customized model …

Accelerating Lossy and Lossless Compression on Emerging BlueField DPU Architectures

Y Li, A Kashyap, W Chen, Y Guo… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Data compression has become a crucial technique in addressing performance bottlenecks
caused by increasing data volumes in High-Performance Computing (HPC), Big Data, and …

AI and Quantum Approaches for Drug Discovery and Protein Structure Prediction

DW Perry II - 2023 - search.proquest.com
The fields of drug discovery and protein structure prediction have made significant
advancements with the integration of artificial intelligence (AI) and quantum computing …

[PDF][PDF] Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs

Y Feng, S Na, H Kim, H Jeon - seonjinna.github.io
With the advancement of processor packaging technology and the looming end of Moore's
law, multi-chip-module (MCM) GPUs become a promising architecture to continue the …

[引用][C] CFD 를위한효율적인HPC 아키텍처설계

김태수, 장효준, 박충원, 송지훈, 박남은 - dbpia.co.kr
CFD 시뮬레이션을 위한 HPC 시스템은 단일서버의 성능을 극복하고자 고안된 분산
컴퓨팅을기반으로 발전해 왔다. 프로세서와 메모리, 인터커넥트네트워크, 저장장치와 같은 …