InfiniBand Verbs on GPU: a case study of controlling an InfiniBand network device from the GPU

L Oden, H Fröning - The International Journal of High …, 2017 - journals.sagepub.com
The International Journal of High Performance Computing …, 2017journals.sagepub.com
Due to their massive parallelism and high performance per Watt, GPUs have gained high
popularity in high-performance computing and are a strong candidate for future exascale
systems. But communication and data transfer in GPU-accelerated systems remain a
challenging problem. Since the GPU normally is not able to control a network device, a
hybrid-programming model is preferred whereby the GPU is used for calculation and the
CPU handles the communication. As a result, communication between distributed GPUs …
Due to their massive parallelism and high performance per Watt, GPUs have gained high popularity in high-performance computing and are a strong candidate for future exascale systems. But communication and data transfer in GPU-accelerated systems remain a challenging problem. Since the GPU normally is not able to control a network device, a hybrid-programming model is preferred whereby the GPU is used for calculation and the CPU handles the communication. As a result, communication between distributed GPUs suffers from unnecessary overhead, introduced by switching control flow from GPUs to CPUs and vice versa. Furthermore, often a designated CPU thread is required to control GPU-related communication. In this work, we modify user space libraries and device drivers of GPUs and the InfiniBand network device in a way to enable the GPU to control an InfiniBand network device to independently source and sink communication requests without any involvement of the CPU. Our results show that complex networking protocols such as InfiniBand Verbs are better handled by CPUs, since overhead of work request generation cannot be parallelized and is not suitable for the highly parallel programming model of GPUs. The massive number of instructions and accesses to host memory that is required to source and sink a communication request on the GPU slows down the performance. Only through a massive reduction in the complexity of the InfiniBand protocol can some performance improvements be achieved.
Sage Journals
以上显示的是最相近的搜索结果。 查看全部搜索结果

Google学术搜索按钮

example.edu/paper.pdf
搜索
获取 PDF 文件
引用
References