Theoretically-efficient and practical parallel DBSCAN

Y Wang, Y Gu, J Shun - Proceedings of the 2020 ACM SIGMOD …, 2020 - dl.acm.org
The DBSCAN method for spatial clustering has received significant attention due to its
applicability in a variety of data analysis tasks. There are fast sequential algorithms for …

Density-based algorithms for big data clustering using MapReduce framework: A Comprehensive Study

M Khader, G Al-Naymat - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
Clustering is used to extract hidden patterns and similar groups from data. Therefore,
clustering as a method of unsupervised learning is a crucial technique for big data analysis …

A year in the life of a parallel file system

GK Lockwood, S Snyder, T Wang… - … conference for high …, 2018 - ieeexplore.ieee.org
I/O performance is a critical aspect of data-intensive scientific computing. We seek to
advance the state of the practice in understanding and diagnosing I/O performance issues …

Transparent asynchronous parallel I/O using background threads

H Tang, Q Koziol, J Ravi, S Byna - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Moving toward exascale computing, the size of data stored and accessed by applications is
ever increasing. However, traditional disk-based storage has not seen improvements that …

UniviStor: Integrated hierarchical and distributed storage for HPC

T Wang, S Byna, B Dong, H Tang - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
High performance computing (HPC) architectures have been adding new layers of storage,
such as burst buffers, to tolerate latency between memory and disk-based file systems …

Dynamic load balancing based on constrained kd tree decomposition for parallel particle tracing

J Zhang, H Guo, F Hong, X Yuan… - IEEE transactions on …, 2017 - ieeexplore.ieee.org
We propose a dynamically load-balanced algorithm for parallel particle tracing, which
periodically attempts to evenly redistribute particles across processes based on kd tree …

HY-DBSCAN: A hybrid parallel DBSCAN clustering algorithm scalable on distributed-memory computers

G Wu, L Cao, H Tian, W Wang - Journal of Parallel and Distributed …, 2022 - Elsevier
Dbscan is a density-based clustering algorithm which is well known for its ability to discover
clusters of arbitrary shape as well as to distinguish noise. As it is computationally expensive …

μDBSCAN: an exact scalable DBSCAN algorithm for big data exploiting spatial locality

A Sarma, P Goyal, S Kumari, A Wani… - … on cluster computing …, 2019 - ieeexplore.ieee.org
DBSCAN is one of the most popular and effective clustering algorithms that is capable of
identifying arbitrary-shaped clusters and noise efficiently. However, its super-linear …

UMAMI: a recipe for generating meaningful metrics through holistic I/O performance analysis

GK Lockwood, W Yoo, S Byna, NJ Wright… - Proceedings of the 2nd …, 2017 - dl.acm.org
I/O efficiency is essential to productivity in scientific computing, especially as many scientific
domains become more data-intensive. Many characterization tools have been used to …

Toward scalable and asynchronous object-centric data management for HPC

H Tang, S Byna, F Tessier, T Wang… - 2018 18th IEEE/ACM …, 2018 - ieeexplore.ieee.org
Emerging high performance computing (HPC) systems are expected to be deployed with an
unprecedented level of complexity due to a deep system memory and storage hierarchy …