Building Machine Learning Algorithms on Hadoop for Bigdata

Full Text PDF PDF
Author(s) Asha T | Shravanthi U.M | Nagashree N | Monika M
Pages 143-147
Volume 3
Issue 2
Date February, 2013
Keywords Machine Learning, Hadoop, Mapreduce, Parallelism, Bigdata

Abstract

Machine Learning (ML) is at the core of data analysis. Machine Learning Algorithms (MLA) are sequential and recursive and the accuracy of MLA’s rely on size of the data (i.e., greater the data more accurate is the result). Absence of a reliable framework for MLA to work for bigdata has made these algorithms to cripple their ability to reach the fullest potential. Hadoop is one such framework that offers distributed storage and parallel data processing. The Existing problem to implement MLA on Hadoop is that the MLA’s need data to be stored in single place because of its recursive nature, but Hadoop does not support data sharing. In this paper we propose an approach to build Machine Learning models for recursive MLA’s on Hadoop so that the power of Machine Learning and Hadoop can be made available to process Bigdata. We compare the performance of ID3 decision tree algorithm, K-means clustering algorithm and K-Nearest Neighbor algorithm on both serial implementation and parallel implementation using Hadoop.

< Back to February Issue