Mighty-morphing power-SIMD

K Han, J Ahn, K Choi - ACM Transactions on Architecture and Code …, 2013 - dl.acm.org

Coarse-grained reconfigurable architecture typically has an array of processing elements
which are controlled by a centralized unit. This makes it difficult to execute programs having …

被引用次数：47 相关文章

[PDF] academia.edu

Libra: Tailoring simd execution using heterogeneous hardware and dynamic configurability

Y Park, JJK Park, H Park… - 2012 45th Annual IEEE …, 2012 - ieeexplore.ieee.org

Mobile computing as exemplified by the smart phone has become an integral part of our
daily lives. The next generation of these devices will be driven by providing an even richer …

被引用次数：44 相关文章所有 11 个版本

[PDF] arxiv.org

Nomad-attention: Efficient llm inference on cpus through multiply-add-free attention

T Zhang, JW Yi, B Yao, Z Xu, A Shrivastava - arXiv preprint arXiv …, 2024 - arxiv.org

Large language model inference on Central Processing Units (CPU) is challenging due to
the vast quantities of expensive Multiply-Add (MAD) matrix operations in the attention …

被引用次数：5 相关文章所有 2 个版本

[PDF] acm.org

Occamy: Elastically sharing a simd co-processor across multiple cpu cores

Z Zhang, Y Ou, Y Liu, C Wang, Y Zhou… - Proceedings of the 28th …, 2023 - dl.acm.org

SIMD extensions are widely adopted in multi-core processors to exploit data-level
parallelism. However, when co-running workloads on different cores, compute-intensive …

被引用次数：4 相关文章所有 2 个版本

[PDF] springer.com

Exploiting tightly-coupled cores

D Bates, A Bradbury, A Koltes, R Mullins - Journal of Signal Processing …, 2015 - Springer

The individual processors of a chip-multiprocessor traditionally have rigid boundaries. Inter-
core communication is only possible via memory, and control over a core's resources is …

被引用次数：18 相关文章所有 16 个版本

[PDF] tue.nl

Construction and exploitation of VLIW ASIPs with heterogeneous vector-widths

E Diken, R Jordans, R Corvino, L Jóźwiak… - Microprocessors and …, 2014 - Elsevier

Numerous applications in important domains, such as communication and multimedia, show
a significant data-level parallelism (DLP). A large part of the DLP is usually exploited …

被引用次数：16 相关文章所有 4 个版本

[PDF] psu.edu

Exploiting both pipelining and data parallelism with SIMD reconfigurable architecture

Y Kim, J Lee, J Lee, TX Mai, I Heo, Y Paek - International Symposium on …, 2012 - Springer

Reconfigurable Architecture (RA), which provides extremely high energy efficiency for
certain domains of applications, have one problem that current mapping algorithms for it do …

被引用次数：14 相关文章所有 9 个版本

LRMDCR: A learner's role-based multi dimensional collaborative recommendation for group learning support

X Wan, T Ninomiya, T Okamoto - 2008 Eighth IEEE …, 2008 - ieeexplore.ieee.org

In order to improve the ldquoeducational provisionrdquo to implement the e-learning
recommender system, we propose a new recommendation approach which has been …

被引用次数：14 相关文章所有 4 个版本

[引用][C] SIMD 自动向量化编译优化概述

高伟，赵荣彩，韩林，庞建民，丁锐 - 软件学报, 2015

被引用次数：16 相关文章所有 6 个版本

[PDF] jst.go.jp

Dual-Core Framework: Eliminating the Bottleneck Effect of Scalar Kernels on SIMD Architectures

Y Wang, S Chen, H Chen, J Wan… - … on Information and …, 2013 - search.ieice.org

The efficiency of ubiquitous SIMD (Single Instruction Multiple Data) media processors is
seriously limited by the bottleneck effect of the scalar kernels in media applications. To solve …

被引用次数：5 相关文章所有 8 个版本

高级搜索

QQ 群

Power-efficient predication techniques for acceleration of control flow execution on cgra