Balanced loop retiming to effectively architect STT-RAM-based hybrid cache for VLIW processors

K Qiu, W Zhang, X Wu, X Zhu, J Wang, Y Xu… - Proceedings of the 31st …, 2016 - dl.acm.org
Loop retiming has been extensively studied to maximize instruction-level parallelism (ILP) of
multiple function units by rearranging the dependence delays in a uniform loop. Recently …

Brloop: Constructing balanced retimed loop to architect stt-ram-based hybrid cache for vliw processors

K Qiu, Y Zhu, Y Xu, Q Huo, CJ Xue - Microelectronics Journal, 2019 - Elsevier
The new emerging non-volatile memory technology of Spin Torque Transfer RAM (STT-
RAM) has been proposed as a replacement for SRAM based cache. Recently its commercial …

Migration-aware loop retiming for STT-RAM-based hybrid cache in embedded systems

K Qiu, M Zhao, Q Li, C Fu… - IEEE Transactions on …, 2014 - ieeexplore.ieee.org
Recently hybrid cache architecture consisting of both spin-transfer torque RAM (STT-RAM)
and SRAM has been proposed for energy efficiency. In hybrid caches, migration-based …

Exploring adaptive cache for reconfigurable VLIW processor

S Hu, J Haung - IEEE Access, 2019 - ieeexplore.ieee.org
In this paper, we focus on a very long instruction word (VLIW) processor design that “shares”
its cache blocks when switching to different performance modes to alleviate the …

DAM: Deadblock aware migration techniques for STT-RAM-based hybrid caches

A Sarkar, N Singh, V Venkitaraman… - IEEE Computer …, 2021 - ieeexplore.ieee.org
Last Level Caches (LLCs) play a critical role in reducing the number of costly off-chip
memory accesses. Hence, there is a demand to make the LLCs larger to meet the …

Migration-aware loop retiming for STT-RAM based hybrid cache for embedded systems

K Qiu, M Zhao, C Fu, L Shi… - 2013 IEEE 24th …, 2013 - ieeexplore.ieee.org
In hybrid cache architecture consisting of both STT-RAM and SRAM, migration based
techniques have been proposed. The migration technique dynamically moves write …

Large-reach memory management unit caches: Coalesced and shared memory management unit caches to accelerate TLB miss handling

A Bhattacharjee - 2013 46th Annual IEEE/ACM International Symposium … - infona.pl
Within the ever-important memory hierarchy, little research is devoted to Memory
Management Unit (MMU) caches, implemented in modern processors to accelerate …

Read-tuned STT-RAM and eDRAM cache hierarchies for throughput and energy optimization

N Khoshavi, RF Demara - IEEE Access, 2018 - ieeexplore.ieee.org
As capacity and complexity of on-chip cache memory hierarchy increases, the service cost to
the critical loads from last level cache (LLC), which are frequently repeated, has become a …

Reducing latency in an SRAM/DRAM cache hierarchy via a novel tag-cache architecture

F Hameed, L Bauer, J Henkel - Proceedings of the 51st Annual Design …, 2014 - dl.acm.org
Memory speed has become a major performance bottleneck as more and more cores are
integrated on a multi-core chip. The widening latency gap between high speed cores and …

Effective TLB thrashing: unveiling the true short reach of modern TLB designs

AR Hernández C, WM Lin - Proceedings of the 37th ACM/SIGAPP …, 2022 - dl.acm.org
The Memory Management Unit (MMU) in modern processors now includes a Translation
Lookaside Buffer (TLB) that caches recently-used Page-Table Entries (PTEs), and prevents …