Reducing GPU offload latency via fine-grained CPU-GPU synchronization

D Lustig, M Martonosi - 2013 IEEE 19th International …, 2013 - ieeexplore.ieee.org
GPUs are seeing increasingly widespread use for general purpose computation due to their
excellent performance for highly-parallel, throughput-oriented applications. For many …

The effect of communication and synchronization on Amdahl's law in multicore systems

L Yavits, A Morad, R Ginosar - Parallel Computing, 2014 - Elsevier
This work analyses the effects of sequential-to-parallel synchronization and inter-core
communication on multicore performance, speedup and scaling from Amdahl's law …

Zonedefense: A fault-tolerant routing for 2-d meshes without virtual channels

B Fu, Y Han, H Li, X Li - IEEE Transactions on Very Large Scale …, 2013 - ieeexplore.ieee.org
Fault-tolerant routing is usually used to provide reliable on-chip communication for many-
core processors. This paper focuses on a special class of algorithms that do not use virtual …

Footprint: Regulating routing adaptiveness in networks-on-chip

B Fu, J Kim - Proceedings of the 44th Annual International …, 2017 - dl.acm.org
Routing algorithms can improve network performance by maximizing routing adaptiveness
but can be problematic in the presence of endpoint congestion. Tree-saturation is a well …

DRLAR: A deep reinforcement learning-based adaptive routing framework for network-on-chips

S Wang, X Zhang, C Wang, K Wu, C Li, D Dong - Computer Networks, 2024 - Elsevier
Adaptive routing plays a pivotal role in the overall performance of Network-on-Chips (NoCs).
However, with many-core architectures supporting complex and constantly changing traffic …

An operating system for safety-critical applications on manycore processors

F Kluge, M Gerdes, T Ungerer - 2014 IEEE 17th International …, 2014 - ieeexplore.ieee.org
Processor technology is advancing from bus-based multicores to network-on-chip-based
many cores, posing new challenges for operating system design. In this paper, we discuss …

Shared-resource-centric limited preemptive scheduling: A comprehensive study of suspension-based partitioning approaches

Z Dong, C Liu, S Bateni, KH Chen… - 2018 IEEE Real …, 2018 - ieeexplore.ieee.org
This paper studies the problem of scheduling a set of hard real-time sporadic tasks that may
access CPU cores and a shared resource. Motivated by the observation that the CPU …

Extendable pattern-oriented optimization directives

H Cui, J Xue, L Wang, Y Yang, X Feng… - ACM Transactions on …, 2012 - dl.acm.org
Algorithm-specific, that is, semantic-specific optimizations have been observed to bring
significant performance gains, especially for a diverse set of multi/many-core architectures …

Godson-T: An efficient many-core processor exploring thread-level parallelism

D Fan, H Zhang, D Wang, X Ye, F Song, G Li… - IEEE Micro, 2012 - ieeexplore.ieee.org
Godson-T is a research many-core processor designed for parallel scientific computing that
delivers efficient performance and flexible programmability simultaneously. It also has many …

Fault-tolerant network-on-chip

X Li, G Yan, C Liu - Built-in Fault-Tolerant Computing Paradigm for …, 2023 - Springer
Manycore systems are emerging for tera-scale computation and typically utilize Network-on-
Chip (NoC) as the communication fabrics between the cores. Since a single routing node …