Swizzle-switch networks for many-core systems

K Sewell, RG Dreslinski, T Manville… - IEEE Journal on …, 2012 - ieeexplore.ieee.org
This work revisits the design of crossbar and high-radix interconnects in light of advances in
circuit and layout techniques that improve crossbar scalability, obviating the need for deep …

Knowledge patterns and sources of leadership: Mapping the semiconductor miniaturization trajectory

M Epicoco - Research Policy, 2013 - Elsevier
This article examines the technological capabilities that national organizations generated
and accumulated throughout the long-term evolution of the miniaturization trajectory, the …

Crossbar NoCs are scalable beyond 100 nodes

G Passas, M Katevenis… - IEEE Transactions on …, 2012 - ieeexplore.ieee.org
We describe the design and layout of a radix-128 crossbar in 90 nm CMOS. The data path is
32 bits wide and runs at 750 MHz using a three-stage pipeline, while fitting in a silicon area …

Scaling towards kilo-core processors with asymmetric high-radix topologies

N Abeyratne, R Das, Q Li, K Sewell… - 2013 IEEE 19th …, 2013 - ieeexplore.ieee.org
In this paper, we explore the challenges in scaling on-chip networks towards kilo-core
processors. Current low-radix topologies optimize for fast local communication, but do not …

Cambricon-R: A Fully Fused Accelerator for Real-Time Learning of Neural Scene Representation

X Song, Y Wen, X Hu, T Liu, H Zhou, H Han… - Proceedings of the 56th …, 2023 - dl.acm.org
Neural scene representation (NSR) initiates a new methodology of encoding a 3D scene
with neural networks by learning from dozens of photos taken from different camera …

Mesh-of-trees and alternative interconnection networks for single-chip parallelism

AO Balkan, G Qu, U Vishkin - IEEE Transactions on Very Large …, 2009 - ieeexplore.ieee.org
In single-chip parallel processors, it is crucial to implement a high-throughput low-latency
interconnection network to connect the on-chip components, especially the processing units …

A 128 x 128 x 24gb/s crossbar interconnecting 128 tiles in a single hop and occupying 6% of their area

G Passas, M Katevenis… - 2010 Fourth ACM/IEEE …, 2010 - ieeexplore.ieee.org
We describe the implementation of a 128× 128 crossbar switch in 90 nm CMOS standard-
cell ASIC technology. The crossbar operates at 750 MHz and is 32-bits for a port capacity …

A Data‐Flow Soft‐Core Processor for Accelerating Scientific Calculation on FPGAs

L Verdoscia, R Giorgi - Mathematical Problems in Engineering, 2016 - Wiley Online Library
We present a new type of soft‐core processor called the “Data‐Flow Soft‐Core” that can be
implemented through FPGA technology with adequate interconnect resources. This …

Electro-photonic noc designs for kilocore systems

JL Abellán, C Chen, A Joshi - ACM Journal on Emerging Technologies …, 2016 - dl.acm.org
The increasing core count in manycore systems requires a corresponding large Network-on-
chip (NoC) bandwidth to support the overlying applications. However, it is not possible to …

Cache capacity aware thread scheduling for irregular memory access on many-core GPGPUs

HK Kuo, TK Yen, BCC Lai… - 2013 18th Asia and South …, 2013 - ieeexplore.ieee.org
On-chip shared cache is effective to alleviate the memory bottleneck in modern many-core
systems, such as GPGPUs. However, when scheduling numerous concurrent threads on a …