- 学术资源搜索

Profiling a warehouse-scale computer

S Kanev, JP Darago, K Hazelwood… - Proceedings of the …, 2015 - dl.acm.org

With the increasing prevalence of warehouse-scale (WSC) and cloud computing,
understanding the interactions of server applications with the underlying microarchitecture …

被引用次数：607 相关文章所有 18 个版本

[PDF] cam.ac.uk

Decoupled vector runahead

A Naithani, J Roelandts, S Ainsworth… - Proceedings of the 56th …, 2023 - dl.acm.org

We present Decoupled Vector Runahead (DVR), an in-core prefetching technique,
executing separately to the main application thread, that exploits massive amounts of …

被引用次数：12 相关文章所有 9 个版本

[PDF] acm.org

APPROX-NoC: A data approximation framework for network-on-chip architectures

R Boyapati, J Huang, P Majumder, KH Yum… - Proceedings of the 44th …, 2017 - dl.acm.org

The trend of unsustainable power consumption and large memory bandwidth demands in
massively parallel multicore systems, with the advent of the big data era, has brought upon …

被引用次数：94 相关文章所有 9 个版本

[PDF] cam.ac.uk

An event-triggered programmable prefetcher for irregular workloads

S Ainsworth, TM Jones - ACM Sigplan Notices, 2018 - dl.acm.org

Many modern workloads compute on large amounts of data, often with irregular memory
accesses. Current architectures perform poorly for these workloads, as existing prefetching …

被引用次数：85 相关文章所有 9 个版本

[PDF] cwi.nl

Beyond the wall: Near-data processing for databases

SL Xi, A Augusta, M Athanassoulis… - Proceedings of the 11th …, 2015 - dl.acm.org

The continuous growth of main memory size allows modern data systems to process entire
large scale datasets in memory. The increase in memory capacity, however, is not matched …

被引用次数：107 相关文章所有 15 个版本

[PDF] cam.ac.uk

Graph prefetching using data structure knowledge

S Ainsworth, TM Jones - … of the 2016 International Conference on …, 2016 - dl.acm.org

Searches on large graphs are heavily memory latency bound, as a result of many high
latency DRAM accesses. Due to the highly irregular nature of the access patterns involved …

被引用次数：100 相关文章所有 10 个版本

[PDF] cam.ac.uk

Software prefetching for indirect memory accesses

S Ainsworth, TM Jones - 2017 IEEE/ACM International …, 2017 - ieeexplore.ieee.org

Many modern data processing and HPC workloads are heavily memory-latency bound. A
tempting proposition to solve this is software prefetching, where special non-blocking loads …

被引用次数：82 相关文章所有 14 个版本

[PDF] mit.edu

Pipette: Improving core utilization on irregular applications through intra-core pipeline parallelism

QM Nguyen, D Sanchez - 2020 53rd Annual IEEE/ACM …, 2020 - ieeexplore.ieee.org

Applications with irregular memory accesses and control flow, such as graph algorithms and
sparse linear algebra, use high-performance cores very poorly and suffer from dismal IPC …

被引用次数：36 相关文章所有 6 个版本

[PDF] nsf.gov

SpZip: Architectural support for effective data compression in irregular applications

Y Yang, JS Emer, D Sanchez - 2021 ACM/IEEE 48th Annual …, 2021 - ieeexplore.ieee.org

Irregular applications, such as graph analytics and sparse linear algebra, exhibit frequent
indirect, data-dependent accesses to single or short sequences of elements that cause high …

被引用次数：26 相关文章所有 9 个版本

[PDF] acm.org

Logca: A high-level performance model for hardware accelerators

MSB Altaf, DA Wood - ACM SIGARCH Computer Architecture News, 2017 - dl.acm.org

With the end of Dennard scaling, architects have increasingly turned to special-purpose
hardware accelerators to improve the performance and energy efficiency for some …

被引用次数：53 相关文章所有 11 个版本

高级搜索

QQ 群

Profiling a warehouse-scale computer

Decoupled vector runahead

APPROX-NoC: A data approximation framework for network-on-chip architectures

An event-triggered programmable prefetcher for irregular workloads

Beyond the wall: Near-data processing for databases

Graph prefetching using data structure knowledge

Software prefetching for indirect memory accesses

Pipette: Improving core utilization on irregular applications through intra-core pipeline parallelism

SpZip: Architectural support for effective data compression in irregular applications

Logca: A high-level performance model for hardware accelerators

引用