Towards efficient sparse matrix vector multiplication on real processing-in-memory architectures

C Giannoula, I Fernandez, J Gómez-Luna… - ACM SIGMETRICS …, 2022 - dl.acm.org
Several manufacturers have already started to commercialize near-bank Processing-In-
Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures …

Gearbox: A case for supporting accumulation dispatching and hybrid partitioning in PIM-based accelerators

M Lenjani, A Ahmed, M Stan, K Skadron - Proceedings of the 49th …, 2022 - dl.acm.org
Processing-in-memory (PIM) minimizes data movement overheads by placing processing
units near each memory segment. Recent PIMs employ processing units with a SIMD …

Asynchronous automata processing on GPUs

H Liu, S Pai, A Jog - Proceedings of the ACM on Measurement and …, 2023 - dl.acm.org
Finite-state automata serve as compute kernels for many application domains such as
pattern matching and data analytics. Existing approaches on GPUs exploit three levels of …

Memory-based computing for energy-efficient ai: Grand challenges

F Karimzadeh, M Imani, B Asgari, N Cao… - 2023 IFIP/IEEE 31st …, 2023 - ieeexplore.ieee.org
The remarkable progress in artificial intelligence (AI) has ushered in a new era
characterized by models with billions of parameters, enabling extraordinary capabilities …

HAP: A spatial-von neumann heterogeneous automata processor with optimized resource and IO overhead on FPGA

X Wang, L Gong, J Cao, W Lou, W Wang… - Proceedings of the …, 2023 - dl.acm.org
Regular expression (REGEX) matching tasks drive much research on automata processors
(AP). Among them, the von Neumann AP can efficiently utilize on-chip memory to process …

Accelerating Irregular Applications via Efficient Synchronization and Data Access Techniques

C Giannoula - arXiv preprint arXiv:2211.05908, 2022 - arxiv.org
Irregular applications comprise an increasingly important workload domain for many fields,
including bioinformatics, chemistry, physics, social sciences and machine learning …

ngAP: Non-blocking Large-scale Automata Processing on GPUs

T Ge, T Zhang, H Liu - Proceedings of the 29th ACM International …, 2024 - dl.acm.org
Finite automata serve as compute kernels for various applications that require high
throughput. However, despite the increasing compute power of GPUs, their potential in …

Low-power near-data instruction execution leveraging opcode-based timing analysis

T Athanasios, D Georgios, S Georgios - ACM Transactions on …, 2022 - dl.acm.org
Traditional processor architectures utilize an external DRAM for data storage, while they
also operate under worst-case timing constraints. Such designs are heavily constrained by …

DynamAP: Architectural Support for Dynamic Graph Traversal on the Automata Processor

Y Liu, X Zhang, D Zhuang, X Fu, S Song - ACM Transactions on …, 2022 - dl.acm.org
Dynamic graph traversals (DGTs) currently are widely used in many important application
domains, especially in this big-data era that urgently demands high-performance graph …

[PDF][PDF] Επιτάχυνση των Μη-Κανονικών Εφαρμογών Μέσω Αποδοτικών Τεχνικών Συγχρονισμού και Βελτιστοποιημένων Τεχνικών Πρόσβασης στα Δεδομένα

Χ Γιαννούλα - 2023 - dspace.lib.ntua.gr
Περίληψη Οι µη-ϰανονιϰές εφαρµογές, όπως οι εφαρµογές επεξεργασίας γράφων,
παράλληλων δοµών δεδοµένων ϰαι επίλυσης αραιών γραµµιϰών συστηµάτων, αποτελούν …