作者
Jianguo Chen, Kenli Li, Zhuo Tang, Kashif Bilal, Shui Yu, Chuliang Weng, Keqin Li
发表日期
2016/8/31
期刊
IEEE Transactions on Parallel and Distributed Systems
卷号
28
期号
4
页码范围
919-933
出版商
IEEE
简介
With the emergence of the big data age, the issue of how to obtain valuable knowledge from a dataset efficiently and accurately has attracted increasingly attention from both academia and industry. This paper presents a Parallel Random Forest (PRF) algorithm for big data on the Apache Spark platform. The PRF algorithm is optimized based on a hybrid approach combining dataparallel and task-parallel optimization. From the perspective of data-parallel optimization, a vertical data-partitioning method is performed to reduce the data communication cost effectively, and a data-multiplexing method is performed is performed to allow the training dataset to be reused and diminish the volume of data. From the perspective of task-parallel optimization, a dual parallel approach is carried out in the training process of RF, and a task Directed Acyclic Graph (DAG) is created according to the parallel training process of PRF …
引用总数
201720182019202020212022202320242142639180706320
学术搜索中的文章
J Chen, K Li, Z Tang, K Bilal, S Yu, C Weng, K Li - IEEE Transactions on Parallel and Distributed Systems, 2016