Syncron: Efficient synchronization support for near-data-processing architectures

C Giannoula, N Vijaykumar… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Near-Data-Processing (NDP) architectures present a promising way to alleviate data
movement costs and can provide significant performance and energy benefits to parallel …

Dimm-link: Enabling efficient inter-dimm communication for near-memory processing

Z Zhou, C Li, F Yang, G Sun - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
DIMM-based near-memory processing architectures (DIMM-NMP) have received growing
interest from both academia and industry. They have the advantages of large memory …

Spread-n-share: improving application performance and cluster throughput with resource-aware job placement

X Tang, H Wang, X Ma, N El-Sayed, J Zhai… - Proceedings of the …, 2019 - dl.acm.org
Traditional batch job schedulers adopt the Compact-n-Exclusive (CE) strategy, packing
processes of a parallel job into as few compute nodes as possible. While CE minimizes inter …

Boosting Performance and QoS for Concurrent GPU B+ trees by Combining-Based Synchronization

W Zhang, C Zhao, L Peng, Y Lin, F Zhang… - Proceedings of the 28th …, 2023 - dl.acm.org
Concurrent B+ trees have been widely used in many systems. With the scale of data
requests increasing exponentially, the systems are facing tremendous performance …

Massively scaling seismic processing on sunway taihulight supercomputer

Y Hu, H Yang, Z Luan, L Gan, G Yang… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
Common Midpoint (CMP) and Common Reflection Surface (CRS) are widely used methods
for improving the signal-to-noise ratio in the field of seismic processing. These methods are …

Accelerating Irregular Applications via Efficient Synchronization and Data Access Techniques

C Giannoula - arXiv preprint arXiv:2211.05908, 2022 - arxiv.org
Irregular applications comprise an increasingly important workload domain for many fields,
including bioinformatics, chemistry, physics, social sciences and machine learning …

Paths to fast barrier synchronization on the node

C Hetland, G Tziantzioulis, B Suchy… - Proceedings of the 28th …, 2019 - dl.acm.org
Synchronization primitives like barriers heavily impact the performance of parallel programs.
As core counts increase and granularity decreases, the value of enabling fast barriers …

DynAMO: Improving Parallelism Through Dynamic Placement of Atomic Memory Operations

V Soria-Pardos, A Armejach, T Mück… - Proceedings of the 50th …, 2023 - dl.acm.org
With increasing core counts in modern multi-core designs, the overhead of synchronization
jeopardizes the scalability and efficiency of parallel applications. To mitigate these …

GreenB+ Tree: an energy-efficient B+ tree for MIMD architectures

M Peng, Q Wang, Y Liang, W Guo, S Yang… - CCF Transactions on …, 2024 - Springer
In the current data-intensive landscape, B+ trees are crucial data structures utilized across
various fields like databases and web indexing. With the rise of data explosion, the demand …

High performance GPU concurrent B+ tree

W Zhang, C Zhao, L Peng, Y Lin, F Zhang… - Proceedings of the 27th …, 2022 - dl.acm.org
Concurrent B+ trees have been widely used in many systems from file systems to databases.
With the volume of data requests expanding exponentially, the systems are facing …