PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture

J Ahn, S Yoo, O Mutlu, K Choi - ACM SIGARCH Computer Architecture …, 2015 - dl.acm.org
Processing-in-memory (PIM) is rapidly rising as a viable solution for the memory wall crisis,
rebounding from its unsuccessful attempts in 1990s due to practicality concerns, which are …

Quasar: Resource-efficient and qos-aware cluster management

C Delimitrou, C Kozyrakis - ACM Sigplan Notices, 2014 - dl.acm.org
Cloud computing promises flexibility and high performance for users and high cost-efficiency
for operators. Nevertheless, most cloud facilities operate at very low utilization, hurting both …

Paragon: QoS-aware scheduling for heterogeneous datacenters

C Delimitrou, C Kozyrakis - ACM SIGPLAN Notices, 2013 - dl.acm.org
Large-scale datacenters (DCs) host tens of thousands of diverse applications each day.
However, interference between colocated workloads and the difficulty to match applications …

[图书][B] Benchmarking modern multiprocessors

C Bienia - 2011 - search.proquest.com
Benchmarking has become one of the most important methods for quantitative performance
evaluation of processor and computer system designs. Benchmarking of modern …

The PARSEC benchmark suite: Characterization and architectural implications

C Bienia, S Kumar, JP Singh, K Li - Proceedings of the 17th international …, 2008 - dl.acm.org
This paper presents and characterizes the Princeton Application Repository for Shared-
Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors …

AxBench: A multiplatform benchmark suite for approximate computing

A Yazdanbakhsh, D Mahajan… - IEEE Design & …, 2016 - ieeexplore.ieee.org
Approximate computing is claimed to be a powerful knob for alleviating the peak power and
energy-efficiency issues. However, providing a consistent benchmark suit with diverse …

STAMP: Stanford transactional applications for multi-processing

CC Minh, JW Chung, C Kozyrakis… - 2008 IEEE International …, 2008 - ieeexplore.ieee.org
Transactional Memory (TM) is emerging as a promising technology to simplify parallel
programming. While several TM systems have been proposed in the research literature, we …

Improving GPU performance via large warps and two-level warp scheduling

V Narasiman, M Shebanow, CJ Lee… - Proceedings of the 44th …, 2011 - dl.acm.org
Due to their massive computational power, graphics processing units (GPUs) have become
a popular platform for executing general purpose parallel applications. GPU programming …

Maximizing performance under a power cap: A comparison of hardware, software, and hybrid techniques

H Zhang, H Hoffmann - ACM SIGPLAN Notices, 2016 - dl.acm.org
Power and thermal dissipation constrain multicore performance scaling. Modern processors
are built such that they could sustain damaging levels of power dissipation, creating a need …

Firefly: Illuminating future network-on-chip with nanophotonics

Y Pan, P Kumar, J Kim, G Memik, Y Zhang… - Proceedings of the 36th …, 2009 - dl.acm.org
Future many-core processors will require high-performance yet energy-efficient on-chip
networks to provide a communication substrate for the increasing number of cores. Recent …