PROMISE: An end-to-end design of a programmable mixed-signal accelerator for machine-learning algorithms

P Srivastava, M Kang, SK Gonugondla… - 2018 ACM/IEEE 45th …, 2018 - ieeexplore.ieee.org
Analog/mixed-signal machine learning (ML) accelerators exploit the unique computing
capability of analog/mixed-signal circuits and inherent error tolerance of ML algorithms to …

Täkō: A polymorphic cache hierarchy for general-purpose optimization of data movement

BC Schwedock, P Yoovidhya, J Seibert… - Proceedings of the 49th …, 2022 - dl.acm.org
Current systems hide data movement from software behind the load-store interface.
Software's inability to observe and respond to data movement is the root cause of many …

KrakenOnMem: a memristor-augmented HW/SW framework for taxonomic profiling

T Shahroodi, M Zahedi, A Singh, S Wong… - Proceedings of the 36th …, 2022 - dl.acm.org
State-of-the-art taxonomic profilers that comprise the first step in larger-context metagenomic
studies have proven to be computationally intensive, ie, while accurate, they come at the …

Venice: Improving Solid-State Drive Parallelism at Low Cost via Conflict-Free Accesses

R Nadig, M Sadrosadati, H Mao, NM Ghiasi… - Proceedings of the 50th …, 2023 - dl.acm.org
The performance and capacity of solid-state drives (SSDs) are continuously improving to
meet the increasing demands of modern data-intensive applications. Unfortunately …

Wire-aware architecture and dataflow for cnn accelerators

S Gudaparthi, S Narayanan… - Proceedings of the …, 2019 - dl.acm.org
In spite of several recent advancements, data movement in modern CNN accelerators
remains a significant bottleneck. Architectures like Eyeriss implement large scratchpads …

An eight-core RISC-V processor with compute near last level cache in Intel 4 CMOS

GK Chen, PC Knag, C Tokunaga… - IEEE Journal of Solid …, 2022 - ieeexplore.ieee.org
An eight-core 64-b processor extends RISC-V to perform multiply–accumulate (MAC) within
the shared last level cache (LLC). Instead of moving data from the LLC to the core, compute …

Stream floating: Enabling proactive and decentralized cache optimizations

Z Wang, J Weng, J Lowe-Power, J Gaur… - … Symposium on High …, 2021 - ieeexplore.ieee.org
As multicore systems continue to grow in scale and on-chip memory capacity, the on-chip
network bandwidth and latency become problematic bottlenecks. Because of this …

A high throughput in-MRAM-computing scheme using hybrid p-SOT-MTJ/GAA-CNTFET

Z Tong, Y Xu, Y Liu, X Duan, H Tang… - … on Circuits and …, 2023 - ieeexplore.ieee.org
Silicon-based semiconductor transistors are approaching their physical limits due to
shrinking feature sizes. Simultaneously, traditional silicon-based von Neumann …

A survey of memory-centric energy efficient computer architecture

C Zhang, H Sun, S Li, Y Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Energy efficient architecture is essential to improve both the performance and power
consumption of a computer system. However, modern computers suffer from the severe …

Rebooting virtual memory with midgard

S Gupta, A Bhattacharyya, Y Oh… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Computer systems designers are building cache hierarchies with higher capacity to capture
the ever-increasing working sets of modern workloads. Cache hierarchies with higher …