作者
Jesus Maillo, Sergio Ramírez, Isaac Triguero, Francisco Herrera
发表日期
2017/2/1
期刊
Knowledge-Based Systems
卷号
117
页码范围
3-15
出版商
Elsevier
简介
The k-Nearest Neighbors classifier is a simple yet effective widely renowned method in data mining. The actual application of this model in the big data domain is not feasible due to time and memory restrictions. Several distributed alternatives based on MapReduce have been proposed to enable this method to handle large-scale data. However, their performance can be further improved with new designs that fit with newly arising technologies.
In this work we provide a new solution to perform an exact k-nearest neighbor classification based on Spark. We take advantage of its in-memory operations to classify big amounts of unseen cases against a big training dataset. The map phase computes the k-nearest neighbors in different training data splits. Afterwards, multiple reducers process the definitive neighbors from the list obtained in the map phase. The key point of this proposal lies on the management of the test …
引用总数
2016201720182019202020212022202320243314965566052459
学术搜索中的文章