Runtime code generation and data management for heterogeneous computing in java

JJ Fumero, T Remmelg, M Steuwer… - Proceedings of the …, 2015 - dl.acm.org
Proceedings of the principles and practices of programming on the java platform, 2015dl.acm.org
GPUs (Graphics Processing Unit) and other accelerators are nowadays commonly found in
desktop machines, mobile devices and even data centres. While these highly parallel
processors offer high raw performance, they also dramatically increase program complexity,
requiring extra effort from programmers. This results in difficult-to-maintain and non-portable
code due to the low-level nature of the languages used to program these devices. This
paper presents a high-level parallel programming approach for the popular Java …
GPUs (Graphics Processing Unit) and other accelerators are nowadays commonly found in desktop machines, mobile devices and even data centres. While these highly parallel processors offer high raw performance, they also dramatically increase program complexity, requiring extra effort from programmers. This results in difficult-to-maintain and non-portable code due to the low-level nature of the languages used to program these devices.
This paper presents a high-level parallel programming approach for the popular Java programming language. Our goal is to revitalise the old Java slogan -- Write once, run anywhere --- in the context of modern heterogeneous systems. To enable the use of parallel accelerators from Java we introduce a new API for heterogeneous programming based on array and functional programming. Applications written with our API can then be transparently accelerated on a device such as a GPU using our runtime OpenCL code generator.
In order to ensure the highest level of performance, we present data management optimizations. Usually, data has to be translated (marshalled) between the Java representation and the representation accelerators use. This paper shows how marshal affects runtime and present a novel technique in Java to avoid this cost by implementing our own customised array data structure. Our design hides low level data management from the user making our approach applicable even for inexperienced Java programmers.
We evaluated our technique using a set of applications from different domains, including mathematical finance and machine learning. We achieve speedups of up to 500× over sequential and multi-threaded Java code when using an external GPU.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果