作者
Dweepna Garg, Parth Gohil, Khushboo Trivedi
发表日期
2015/3/5
研讨会论文
2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT)
页码范围
1-5
出版商
IEEE
简介
Apache Hadoop is an open source software framework which structures Big data (both structured and unstructured). It is nowadays one of the biggest motivator in market as data storage is inexpensive in it. The storage method of Hadoop uses a distributed file system which lets the user store large amount of data by simply adding more nodes to a Hadoop cluster. Clustering a large amount of data is a point of concern. MapReduce, a programming model used by Hadoop allows a parallelization technique by decomposing a larger problem involving large dataset to smaller portion of data and then executing it. A scalable machine learning library named as Mahout is an approach to clustering which runs on Hadoop. In this paper, the Hadoop multi-node cluster is formed using Amazon EC2. This paper focuses on Fuzzy k-mean clustering algorithm which is modified by centroid generation method using MapReduce in …
引用总数
20162017201820192020202111413
学术搜索中的文章
D Garg, P Gohil, K Trivedi - 2015 IEEE International Conference on Electrical …, 2015