Vyasa: A high-performance vectorizing compiler for tensor convolutions on the xilinx ai engine

P Chatarasi, S Neuendorffer, S Bayliss… - 2020 IEEE High …, 2020 - ieeexplore.ieee.org
Xilinx's AI Engine is a recent industry example of energy-efficient vector processing that
includes novel support for 2D SIMD datapaths and shuffle interconnection network. The …

A 1080p H. 264/AVC baseline residual encoder for a fine-grained many-core system

Z Xiao, BM Baas - IEEE transactions on circuits and systems for …, 2011 - ieeexplore.ieee.org
This paper presents a baseline residual encoder for H. 264/AVC on a programmable fine-
grained many-core processing array that utilizes no application-specific hardware. The …

Greening the Video Transcoding Service with {Low-Cost} Hardware Transcoders

P Liu, J Yoon, L Johnson, S Banerjee - 2016 USENIX Annual Technical …, 2016 - usenix.org
Video transcoding plays a critical role in a video streaming service. Content owners and
publishers need video transcoders to adapt their videos to different formats, bitrates, and …

Improved SIMD architecture for high performance video processors

WY Lo, DPK Lun, WC Siu, W Wang… - IEEE transactions on …, 2011 - ieeexplore.ieee.org
Single instruction multiple data (SIMD) execution is in no doubt an efficient way to exploit the
data level parallelism in image and video applications. However, SIMD execution …

[PDF][PDF] Architecture and Advantages of SIMD in Multimedia Applications

SM Al-sudany, AS Al-Araji… - Journal of Xi'an University …, 2020 - researchgate.net
In this paper, we identified the single instruction multi-data architecture (SIMD) that is a
method of computing parallelism. Most modern processor designs contain SIMD in order to …

[PDF][PDF] Improving Compute & Data Efficiency of Flexible Architectures

LJW Waeijen - 2022 - research.tue.nl
Man's ambition to construct machinery that can match, or even exceed, his own intelligence
has driven over half a century of research into computer architectures. Enabled by an …

Reduction operator for wide-SIMDs reconsidered

L Waeijen, D She, H Corporaal, Y He - Proceedings of the 51st Annual …, 2014 - dl.acm.org
It has been shown that wide Single Instruction Multiple Data architectures (wide-SIMDs) can
achieve high energy efficiency, especially in domains such as image and vision processing …

Exploration of full HD media decoding on a software defined radio baseband processor

C Mei, M Li, P Cao, A Amin, C Li, J Yang… - IEEE transactions on …, 2013 - ieeexplore.ieee.org
Recently, various specialized Software Defined Radio (SDR) baseband processors have
been proposed for meeting the high performance and programmability requirements for …

[PDF][PDF] ADVANCING COMPILER OPTIMIZATIONS FOR GENERAL-PURPOSE & DOMAIN-SPECIFIC PARALLEL ARCHITECTURES

P Chatarasi - 2020 - pchath.github.io
A key challenge for optimizing compilers is to keep up with the increasing complexity related
to locality and parallelism in modern computers, especially as computer vendors head …

[图书][B] Energy-efficient fine-grained many-core architecture for video and dsp applications

Z Xiao - 2012 - search.proquest.com
Many-core processor architecture has become the most promising computer architecture.
However, how to utilize the extra system performance for real applications such as video …