Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication

T Krishna, LS Peh, BM Beckmann… - Proceedings of the 44th …, 2011 - dl.acm.org
The prevalence of multicore architectures has accentuated the need for scalable cache
coherence solutions. Many of the proposed designs use a mix of 1-to-1, 1-to-many (1-to-M) …

TLSync: support for multiple fast barriers using on-chip transmission lines

J Oh, M Prvulovic, A Zajic - ACM SIGARCH Computer Architecture News, 2011 - dl.acm.org
As the number of cores on a single-chip grows, scalable barrier synchronization becomes
increasingly difficult to implement. In software implementations, such as the tournament …

Model-based analysis of Chinese calligraphy images

STS Wong, H Leung, HHS Ip - Computer Vision and Image Understanding, 2008 - Elsevier
A lot of research and development have been done on producing Chinese fonts with smooth
outlines and solid colouring for computer displays and printing. Existing Chinese fonts …

Single-cycle collective communication over a shared network fabric

T Krishna, LS Peh - … Symposium on Networks-on-Chip (NoCS), 2014 - ieeexplore.ieee.org
In the multicore era, on-chip network latency and throughput have a direct impact on system
performance. A highly important class of communication flows traversing the network is …

An OpenMP* Barrier Using SIMD Instructions for Intel® Xeon PhiTM Coprocessor

D Caballero, A Duran, X Martorell - OpenMP in the Era of Low Power …, 2013 - Springer
Barrier synchronisation is a widely-studied topic since the supercomputer era due to its
significant impact on the overall performance of parallel applications. With the current shift to …

Traffic steering between a low-latency unswitched TL ring and a high-throughput switched on-chip interconnect

J Oh, A Zajic, M Prvulovic - Proceedings of the 22nd …, 2013 - ieeexplore.ieee.org
Growth in core count creates an increasing demand for interconnect bandwidth, driving a
change from shared buses to packet-switched on-chip interconnects. However, this …

Improving GPU Memory Performancewith Artificial Barrier Synchronization

SH Lo, CR Lee, QL Kao, IH Chung… - IEEE transactions on …, 2013 - ieeexplore.ieee.org
Barrier synchronization, an essential mechanism for a block of threads to guard data
consistency, is regarded as a threat to performance. This study, however, provides a …

Non-blocking technique for parallel algorithms with global barrier synchronization

A Garza, CA Parra, ID Scherson - … International Conference on …, 2021 - ieeexplore.ieee.org
Sharing data among asynchronous processes is considered to be a hard systems problem
in multithreaded modern shared-memory multicore systems. Throughout the literature …

Enabling dedicated single-cycle connections over a shared network-on-chip

T Krishna - 2014 - dspace.mit.edu
Adding multiple processing cores on the same chip has become the de facto design choice
as we continue extracting more and more performance/watt from our chips in every …

Photonic-based express coherence notifications for many-core CMPs

JL Abellán, E Padierna, A Ros, ME Acacio - Journal of Parallel and …, 2018 - Elsevier
Directory-based coherence protocols (Directory) are considered the design of choice to
provide maximum performance in coherence maintenance for shared-memory many-core …