Text Classification Using Mahout
G.V. Ramana Reddy1, K. Mounika2, A. Chinmayi2, S. Fareed Hussain2
Citation : G.V. Ramana Reddy, K. Mounika, A. Chinmayi, S. Fareed Hussain, Text Classification Using Mahout International Journal of Research Studies in Computer Science and Engineering 2014, 1(5) : 1-5
The storage, processing and analysis of BIGDATA present a plethora of new challenges to computer science researchers and IT professionals. Mahout is a set of distributed data mining libraries that interface with an underlying distributed system. The frame-work for the distributed system is Hadoop, which implements Mapreduce. Mahout provides a library of scalable machine learning algorithms useful for big data analysis based on Hadoop or other storage systems. Classification techniques decide how much a thing is or isn't part of some type or category, or how much it does or doesn't have some attribute. Classification, like clustering, is ubiquitous, but it's even more behind the scenes. This paper exhibits the classification technique by using Mahout. The sample data was taken from 20 Newsgroups and the resulting Confusion matrix is presented.