作者
Laouni Djafri, Djamel Amar Bensaber, Reda Adjoudj
发表日期
2018/8/20
期刊
Information Discovery and Delivery
卷号
46
期号
3
页码范围
147-160
出版商
Emerald Publishing Limited
简介
Purpose
This paper aims to solve the problems of big data analytics for prediction including volume, veracity and velocity by improving the prediction result to an acceptable level and in the shortest possible time.
Design/methodology/approach
This paper is divided into two parts. The first one is to improve the result of the prediction. In this part, two ideas are proposed: the double pruning enhanced random forest algorithm and extracting a shared learning base from the stratified random sampling method to obtain a representative learning base of all original data. The second part proposes to design a distributed architecture supported by new technologies solutions, which in turn works in a coherent and efficient way with the sampling strategy under the supervision of the Map-Reduce algorithm.
Findings
The representative learning base obtained by the integration of two learning bases, the partial base and the shared …
引用总数
201920202021202220232024134331