[HTML][HTML] Applications and techniques for fast machine learning in science

AMC Deiana, N Tran, J Agar, M Blott… - Frontiers in big …, 2022 - frontiersin.org
In this community review report, we discuss applications and techniques for fast machine
learning (ML) in science—the concept of integrating powerful ML methods into the real-time …

Vision-based autonomous bolt-looseness detection method for splice connections: Design, lab-scale evaluation, and field application

TC Huynh - Automation in Construction, 2021 - Elsevier
This study presents a novel autonomous vision-based bolt-looseness detection method for
splice bolted connections. The method is sequentially designed with a Faster regional …

Aha: An agile approach to the design of coarse-grained reconfigurable accelerators and compilers

K Koul, J Melchert, K Sreedhar, L Truong… - ACM Transactions on …, 2023 - dl.acm.org
With the slowing of Moore's law, computer architects have turned to domain-specific
hardware specialization to continue improving the performance and efficiency of computing …

Marvel: A data-centric approach for mapping deep learning operators on spatial accelerators

P Chatarasi, H Kwon, A Parashar, M Pellauer… - ACM Transactions on …, 2021 - dl.acm.org
A spatial accelerator's efficiency depends heavily on both its mapper and cost models to
generate optimized mappings for various operators of DNN models. However, existing cost …

Unified buffer: Compiling image processing and machine learning applications to push-memory accelerators

Q Liu, J Setter, D Huff, M Strange, K Feng… - ACM Transactions on …, 2023 - dl.acm.org
Image processing and machine learning applications benefit tremendously from hardware
acceleration. Existing compilers target either FPGAs, which sacrifice power and performance …

An Efficient Hybrid Deep Learning Accelerator for Compact and Heterogeneous CNNs

F Qararyah, MW Azhar, P Trancoso - ACM Transactions on Architecture …, 2024 - dl.acm.org
Resource-efficient Convolutional Neural Networks (CNNs) are gaining more attention.
These CNNs have relatively low computational and memory requirements. A common …

[HTML][HTML] Quantune: Post-training quantization of convolutional neural networks using extreme gradient boosting for fast deployment

J Lee, M Yu, Y Kwon, T Kim - Future Generation Computer Systems, 2022 - Elsevier
To adopt convolutional neural networks (CNN) for a range of resource-constrained targets, it
is necessary to compress the CNN models by performing quantization, whereby precision …

Tensorflow to cloud FPGAs: Tradeoffs for accelerating deep neural networks

S Hadjis, K Olukotun - 2019 29th International Conference on …, 2019 - ieeexplore.ieee.org
We present the first open-source TensorFlow to FPGA tool capable of running state-of-the-
art DNNs. Running TensorFlow on the Amazon cloud FPGA instances, we provide …

Fibha: fixed budget hybrid CNN accelerator

F Qararyah, MW Azhar… - 2022 IEEE 34th …, 2022 - ieeexplore.ieee.org
Seeking the “sweet spot” in the accuracy-efficiency trade-off is increasing the heterogeneity
of state-of-the-art Convolutional Neural Networks (CNNs). Such CNN models exhibit …

Transparent compiler and runtime specializations for accelerating managed languages on fpgas

M Papadimitriou, J Fumero, A Stratikopoulos… - arXiv preprint arXiv …, 2020 - arxiv.org
In recent years, heterogeneous computing has emerged as the vital way to increase
computers? performance and energy efficiency by combining diverse hardware devices …