CELLO: Compiler-Assisted Efficient Load-Load Ordering in Data-Race-Free Regions

S Singh, J Feliu, ME Acacio… - 2023 32nd …, 2023 - ieeexplore.ieee.org
Efficient Total Store Order (TSO) implementations allow loads to execute speculatively out-of-
order. To detect order violations, the load queue (LQ) holds all the in-flight loads and is …

DMDC: delayed memory dependence checking through age-based filtering

F Castro, L Pinuel, D Chaver, M Prieto… - 2006 39th Annual …, 2006 - ieeexplore.ieee.org
One of the main challenges of modern processor design is the implementation of a scalable
and efficient mechanism to detect memory access order violations as a result of out-of-order …

Federation: Boosting per-thread performance of throughput-oriented manycore architectures

M Boyer, D Tarjan, K Skadron - ACM Transactions on Architecture and …, 2010 - dl.acm.org
Manycore architectures designed for parallel workloads are likely to use simple, highly
multithreaded, in-order cores. This maximizes throughput, but only with enough threads to …

[PDF][PDF] Federation: Out-of-order execution using simple in-order cores

D Tarjan, M Boyer, K Skadron - … Science., Tech. Report CS-2007-11 …, 2007 - cs.virginia.edu
Manycore architectures with dozens, hundreds, or thousands of threads are likely to use
single-issue, in-order execution cores with simple pipelines but multiple thread contexts per …

Seed: scalable, efficient enforcement of dependences

FJ Mesa-Martínez, MC Huang, J Renau - Proceedings of the 15th …, 2006 - dl.acm.org
Instruction issue logic is a critical component in modern high-performance out-of-order
processors. The ever increasing latencies found in modern processors, mostly associated …

Memory disambiguation hardware: a review

F Castro, D Chaver, L Piñuel, M Prieto… - Journal of Computer …, 2008 - sedici.unlp.edu.ar
One of the main challenges of modern processor designs is the implementation of scalable
and efficient mechanisms to detect memory access order violations as a result of out-of …

Microarchitectural Optimizations for an Efficient Utilization of Processor Resources

S Singh - 2024 - dialnet.unirioja.es
Con el tiempo, la complejidad del hardware en los computadores ha aumentado
constantemente, siempre en pos de ofrecer un mayor rendimiento. Varios son los …

Replacing associative load queues: a timing-centric approach

F Castro, R Noor, A Garg, D Chaver… - IEEE Transactions …, 2008 - ieeexplore.ieee.org
One of the main challenges of modern processor design is the implementation of a scalable
and efficient mechanism to detect memory access order violations as a result of out-of-order …

Using age registers for a simple load–store queue filtering

F Castro, D Chaver, L Piñuel, M Prieto… - Journal of Systems …, 2009 - Elsevier
One of the main challenges of modern processor design is the implementation of a scalable
and efficient mechanism to detect memory access order violations as a result of out-of-order …

[图书][B] Exploring performance-correctness explicitly-decoupled architectures

A Garg - 2011 - search.proquest.com
Optimizing the common case has been an adage in decades of processor design practices.
However, as the system complexity and optimization techniques' sophistication have …