作者
Adnan Ali, Nada Masood Mirza, Mohamad Khairi Ishak
发表日期
2022/5/24
研讨会论文
2022 19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)
页码范围
1-4
出版商
IEEE
简介
In the current era of big data, enormous data is being recorded every second from multiple streams and multiple environments of different types. This hugely generated data is processed with the support of specialized tools such as Hadoop which ensures the processing of data by considering the memory, process allocation, size, and storage. Hadoop framework is known to be efficient with few files of large size rather than many files of small size which caused lots of issues for the framework to work efficiently and the time required for the processing is hugely increased. To eliminate this issue, this work proposes a new algorithm for merging many files of small size into a single large file based on certain match criteria (type and size). This process will be executed before the files are passed to the Hadoop framework. The proposed algorithm ensures that it will generate the least number of large files that reduces the I …
引用总数
学术搜索中的文章
A Ali, NM Mirza, MK Ishak - 2022 19th International Conference on Electrical …, 2022