Recent trends in distributed online stream processing platform for big data: Survey

AH Ali, MZ Abdullah - 2018 1st Annual International …, 2018 - ieeexplore.ieee.org
There is no doubt that big data has become an important source of information and
knowledge, especially for large profitability companies such as Facebook and Amazon. But …

Towards a better replica management for hadoop distributed file system

HE Ciritoglu, T Saber, TS Buda… - … Congress on Big …, 2018 - ieeexplore.ieee.org
The Hadoop Distributed File System (HDFS) is the storage of choice when it comes to large-
scale distributed systems. In addition to being efficient and scalable, HDFS provides high …

[PDF][PDF] New Spam Filtering Method with Hadoop Tuning-Based MapReduce Naïve Bayes.

K Ji, Y Kwon - Computer Systems Science & Engineering, 2023 - cdn.techscience.cn
As the importance of email increases, the amount of malicious email is also increasing, so
the need for malicious email filtering is growing. Since it is more economical to combine …

Qaad (query-as-a-data): Scalable execution of massive number of small queries in spark

Y Park, B Tak, WS Han - Proceedings of the ACM on Management of …, 2023 - dl.acm.org
Spark big data processing platform is heavily used in today's IT services for various critical
applications such as machine learning tasks for service recommendations or massive …

Hard: a heterogeneity-aware replica deletion for hdfs

HE Ciritoglu, J Murphy, C Thorpe - Journal of big data, 2019 - Springer
The Hadoop distributed file system (HDFS) is responsible for storing very large data-sets
reliably on clusters of commodity machines. The HDFS takes advantage of replication to …

Toward Efficient Block Replication Management in Distributed Storage

J Liao, Z Sha, Z Cai, Z Liu, K Li, WK Liao… - ACM Transactions on …, 2020 - dl.acm.org
Distributed/parallel file systems commonly suffer from load imbalance and resource
contention due to the bursty characteristic exhibited in scientific applications. This article …

BT-Duper: A Binomial-Tree Based Data Replication Offloading Method with Native RDMA Primitives

Y Yi, Y Li, Y Xu, P Wang, Y Xu… - 2023 IEEE Intl Conf on …, 2023 - ieeexplore.ieee.org
Remote Direct Memory Access (RDMA) has been widely used in distributed storage systems
as a low latency network technology, especially for the application of data replication. For …

The HDFS replica placement policies: A comparative experimental investigation

RWA Fazul, PP Barcelos - IFIP International Conference on Distributed …, 2022 - Springer
Abstract The Hadoop Distributed File System (HDFS) is a robust and flexible file system
designed for reliably storing large volumes of data in distributed environments. Its storage …

Importance of data distribution on hive-based systems for query performance: An experimental study

HE Ciritoglu, J Murphy, C Thorpe - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
SQL-on-Hadoop systems have been gaining popularity in recent years. One popular
example of SQL-on-Hadoop systems is Apache Hive; the pioneer of SQL-on-Hadoop …

[PDF][PDF] DATA DELETION USING NON-RETRIEVABLE BIT SEQUENCE OVERWRITING APPROACH IN CLOUD STORAGE

SB Joshi, SD Panchal - Indian Journal of Computer Science and …, 2021 - researchgate.net
Cloud storage utilization rapidly increases due to the on-demand availability of computer
system resources, especially computing power and storage requirements. It reduces the …