Baymax: Qos awareness and increased utilization for non-preemptive accelerators in warehouse scale computers

Q Chen, H Yang, J Mars, L Tang - ACM SIGPLAN Notices, 2016 - dl.acm.org
Modern warehouse-scale computers (WSCs) are being outfitted with accelerators to provide
the significant compute required by emerging intelligent personal assistant (IPA) workloads …

μlayer: Low latency on-device inference using cooperative single-layer acceleration and processor-friendly quantization

Y Kim, J Kim, D Chae, D Kim, J Kim - Proceedings of the Fourteenth …, 2019 - dl.acm.org
Emerging mobile services heavily utilize Neural Networks (NNs) to improve user
experiences. Such NN-assisted services depend on fast NN execution for high …

Understanding co-running behaviors on integrated CPU/GPU architectures

F Zhang, J Zhai, B He, S Zhang… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
Architecture designers tend to integrate both CPUs and GPUs on the same chip to deliver
energy-efficient designs. It is still an open problem to effectively leverage the advantages of …

Study and evaluation of automatic GPU offloading method from various language applications

Y Yamato - International Journal of Parallel, Emergent and …, 2022 - Taylor & Francis
Heterogeneous hardware other than a small-core central processing unit (CPU) is
increasingly being used, such as a graphics processing unit (GPU), field-programmable …

Study and evaluation of improved automatic GPU offloading method

Y Yamato - International Journal of Parallel, Emergent and …, 2021 - Taylor & Francis
With the slowing down of Moore's law, the use of hardware other than CPUs, such as
graphics processing units (GPUs) or field-Programmable gate arrays (FPGAs), is increasing …

Graphie: Large-scale asynchronous graph traversals on just a GPU

W Han, D Mawhirter, B Wu… - 2017 26th International …, 2017 - ieeexplore.ieee.org
Most GPU-based graph systems cannot handle large-scale graphs that do not fit in the GPU
memory. The ever-increasing graph size demands a scale-up graph system, which can run …

Regularized least absolute deviations regression and an efficient algorithm for parameter tuning

L Wang, MD Gordon, J Zhu - Sixth International Conference on …, 2006 - ieeexplore.ieee.org
Linear regression is one of the most important and widely used techniques for data analysis.
However, sometimes people are not satisfied with it because of the following two limitations …

Adaptive optimization for OpenCL programs on embedded heterogeneous systems

B Taylor, VS Marco, Z Wang - ACM SIGPLAN Notices, 2017 - dl.acm.org
Heterogeneous multi-core architectures consisting of CPUs and GPUs are commonplace in
today's embedded systems. These architectures offer potential for energy efficient computing …

Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL

MA Dávila Guzmán, R Nozal, R Gran Tejero… - The Journal of …, 2019 - Springer
Heterogeneous systems are the core architecture of most of the high-performance
computing nodes, due to their excellent performance and energy efficiency. However, a key …

Simplifying programming and load balancing of data parallel applications on heterogeneous systems

B Pérez, JL Bosque, R Beivide - … of the 9th Annual Workshop on General …, 2016 - dl.acm.org
Heterogeneous architectures have experienced a great development thanks to their
excellent cost/performance ratio and low power consumption. But heterogeneity significantly …