Hayder K Fatlawi, Attila Kiss
Annales Universitatis Scientiarum Budapestinensis de Rolando Eötvös Nominatae. Sectio Computatorica
The rapid increase in the volume of generated data from various digital resources motivated a new trend of data mining techniques that can be trained continuously in parallel with data stream generation. This kind of technique needs to adapt to new data samples and forget the old ones according to some methods such as Adaptive Sliding Window ADWIN. Using ADWIN and Hoeffiding tree classifiers, the Random Forest algorithm was developed to handle the data stream. While the reduction of data samples that were processed in each time moment produced a reduction in the required resources (time and space), the high dimensionality of the data features is still considered a challenge. In this work, ARF-PCA, a stream data classification model, is proposed to improve the efficiency of the Adaptive Random Forest ARF classifier using Principal component analysis PCA. The evaluation of the proposed model based on three real datasets showed a significant improvement in efficiency while preserving the accuracy of the classification.