- 学术资源搜索

Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication

T Krishna, LS Peh, BM Beckmann… - Proceedings of the 44th …, 2011 - dl.acm.org

The prevalence of multicore architectures has accentuated the need for scalable cache
coherence solutions. Many of the proposed designs use a mix of 1-to-1, 1-to-many (1-to-M) …

被引用次数：125 相关文章所有 10 个版本

[PDF] gatech.edu

TLSync: support for multiple fast barriers using on-chip transmission lines

J Oh, M Prvulovic, A Zajic - ACM SIGARCH Computer Architecture News, 2011 - dl.acm.org

As the number of cores on a single-chip grows, scalable barrier synchronization becomes
increasingly difficult to implement. In software implementations, such as the tournament …

被引用次数：59 相关文章所有 12 个版本

Model-based analysis of Chinese calligraphy images

STS Wong, H Leung, HHS Ip - Computer Vision and Image Understanding, 2008 - Elsevier

A lot of research and development have been done on producing Chinese fonts with smooth
outlines and solid colouring for computer displays and printing. Existing Chinese fonts …

被引用次数：56 相关文章所有 9 个版本

[PDF] gatech.edu

Single-cycle collective communication over a shared network fabric

T Krishna, LS Peh - … Symposium on Networks-on-Chip (NoCS), 2014 - ieeexplore.ieee.org

In the multicore era, on-chip network latency and throughput have a direct impact on system
performance. A highly important class of communication flows traversing the network is …

被引用次数：37 相关文章所有 5 个版本

An OpenMP* Barrier Using SIMD Instructions for Intel^® Xeon Phi^TM Coprocessor

D Caballero, A Duran, X Martorell - OpenMP in the Era of Low Power …, 2013 - Springer

Barrier synchronisation is a widely-studied topic since the supercomputer era due to its
significant impact on the overall performance of parallel applications. With the current shift to …

被引用次数：23 相关文章所有 4 个版本

[PDF] gatech.edu

Traffic steering between a low-latency unswitched TL ring and a high-throughput switched on-chip interconnect

J Oh, A Zajic, M Prvulovic - Proceedings of the 22nd …, 2013 - ieeexplore.ieee.org

Growth in core count creates an increasing demand for interconnect bandwidth, driving a
change from shared buses to packet-switched on-chip interconnects. However, this …

被引用次数：16 相关文章所有 10 个版本

[PDF] nthu.edu.tw

Improving GPU Memory Performancewith Artificial Barrier Synchronization

SH Lo, CR Lee, QL Kao, IH Chung… - IEEE transactions on …, 2013 - ieeexplore.ieee.org

Barrier synchronization, an essential mechanism for a block of threads to guard data
consistency, is regarded as a threat to performance. This study, however, provides a …

被引用次数：12 相关文章所有 4 个版本

Non-blocking technique for parallel algorithms with global barrier synchronization

A Garza, CA Parra, ID Scherson - … International Conference on …, 2021 - ieeexplore.ieee.org

Sharing data among asynchronous processes is considered to be a hard systems problem
in multithreaded modern shared-memory multicore systems. Throughout the literature …

被引用次数：3 相关文章所有 2 个版本

[PDF] mit.edu

Enabling dedicated single-cycle connections over a shared network-on-chip

T Krishna - 2014 - dspace.mit.edu

Adding multiple processing cores on the same chip has become the de facto design choice
as we continue extracting more and more performance/watt from our chips in every …

被引用次数：6 相关文章

[PDF] researchgate.net

Photonic-based express coherence notifications for many-core CMPs

JL Abellán, E Padierna, A Ros, ME Acacio - Journal of Parallel and …, 2018 - Elsevier

Directory-based coherence protocols (Directory) are considered the design of choice to
provide maximum performance in coherence maintenance for shared-memory many-core …

被引用次数：6 相关文章所有 6 个版本

高级搜索

QQ 群

Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication

TLSync: support for multiple fast barriers using on-chip transmission lines

Model-based analysis of Chinese calligraphy images

Single-cycle collective communication over a shared network fabric

An OpenMP* Barrier Using SIMD Instructions for Intel^® Xeon Phi^TM Coprocessor

Traffic steering between a low-latency unswitched TL ring and a high-throughput switched on-chip interconnect

Improving GPU Memory Performancewith Artificial Barrier Synchronization

Non-blocking technique for parallel algorithms with global barrier synchronization

Enabling dedicated single-cycle connections over a shared network-on-chip

Photonic-based express coherence notifications for many-core CMPs

引用