In continuation of my earlier article, I proceeded a little more in understanding of what are these technologies and where are these used and a technical implementation of these. I want present this material into 2 parts,
- Introduction to some of the terminologies and explaining the use case we are going to implement
- Implementation details of the use case
Hadoop is primarily a distributed file system (DFS) where you can store terabytes/petabytes of data on low end commodity computers. This is similar to how companies like Yahoo and Google store their page feeds.
Hbase is a BigTable implementation which can either use file system or Hadoop DFS to store the data. It is similar to RDBMS like Oracle. There are 2 main differences between,
- It is designed to store sparse data, use case is where there are lots of columns and there are no data stored in majority of these columns
- It has a powerful versioning capabilities, a typical use case it to track the trajectory of a rocket over a period of time
Hive is built on the Map Reduce concept and is built on top of Hadoop. It supports SQL like query language. Its main purpose is datawarehouse analysis and adhoc querying. Hive can also be built on top of Hbase.
Spring Batch is a Spring framework for managing a batch workflow.
Let us say you have a website which has lot of traffic. We need to build an application which analyses the traffic like number of times the user has come to the website. Which location the website is accessed most. Below is the data flow,
Log file feed structure,
There will be a lookup table in Hbase as below,
192.168.45.129 kp Boston
192.168.45.126 kk Bangalore
192.168.45.127 ak Bangalore
Now we write 2 applications:
- Spring batch will run a command line to pull the log feed and push the ip address feed to hbase
- Java commandline will pull the hbase information using hive to display the metrics as follows
Now that you know the technology pieces and high level solution, in my next article I will write how we can implement this with working code.