Hadoop and Mahout in Data Mining

Introduction: Data Mining

This blog talks about a typical solution architecture where we use various BigData components like Hadoop, Mahout, Sqoop, Flume and Mongo-Hadoop Connector. Lot of successful companies have been around for atleast 20yrs and have invested in IT infrastructure. They have over a period of time, accumulated data in the form of transaction data, log archive and are eager to analyse these data and see if they can improve the business efficiency out of it.

Below is the Component Diagram of how each of the above frameworks fit into the ecosystem.

Data Mining

In a typical scenario, as the customer are using the IT systems of the company for buying companies products the customer data like , their purchasing patterns, their location, their Gender, how are other similar customers are purchasing are being stored for data mining. Based on this data, the company can helping customers make buying decisions. If you notice carefully Netflix, Amazon, eBay, Gilt are already doing this.

Mahout is one of the open source framework which has a built in Recommender engine. It also runs on Hadoop using Map/Reduce paradigm. In a typical scenario,

  1. User Item data that is stored in MySQL is Data Mining into Hadoop Distributed File System (HDFS) using a ETL tool like Sqoop
  2. User click information that is stored in Log files is exported into HDFS using log migration tools like Flume
  3. If the transaction data is stored in NoSQL, there are connectors to export the data to HDFS

Once the data is in HDFS, we can use Mahout Batch job to run the data analysis and import the processed data back to transactional database.

About these ads

One thought on “Hadoop and Mahout in Data Mining

  1. Tiru Murugan

    Hi krish,

    Thanks for your valuable Information, i had same problem last few months, i dont know where i apply Hadoop and its subprojects… Now i got some idea..

    with Regards
    Tirumurugan.v

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s