作者
Shweta Mittal, Om Prakash Sangwan
发表日期
2021/6/29
期刊
J. Math. Comput. Sci.
卷号
11
期号
5
页码范围
5267-5277
简介
Massive amount of data is being generated from the number of sources on day to day basis. Spark is a very popular open source platform available freely on web to store and process big databases. For training the machines to learn hidden patterns/information from these huge raw databases, machine learning algorithm needs to be implemented. ML and MLLib are two machine learning libraries to implement machine learning algorithms in Spark. In this paper, Decision Trees, Random Forests and Gradient Boosted Trees have been implemented by using Cardiac and Telecom dataset on local PC as well as Google Colab and it was concluded that Gradient Boosted Trees performed better than Decision Trees and Random Forests in terms of accuracy but took longer time to execute. Further, it has been also observed that algorithms took less time to run on Colab GPU as compared to local PC.
学术搜索中的文章
S Mittal, OP Sangwan - J. Math. Comput. Sci., 2021