Monthly Archives: May 2012

SpringData-Hadoop: Jumpstart Hadoop with Spring

These days there are lot of hype around jargons like Hadoop, HBase, Hive, Pig and BigData. I was itching to learn what are these terms and how I can see them in the real world. I had 2 goals setup up for me,

  1. Create Hadoop Single Node instance
  2. Of course figure out how it is integrated with Spring/Spring Batch

As usual, I googled how to quickly set up and learn these tools. The journey was not smooth. For a Windows user there there are 2 ways you can setup Hadoop Single node cluster on your machine.

  1. Cygwin: The first approach is not easy to setup, I took few days to struggle thru this without much results on my Windows 7
  2. Open source and Commercial VM: EMC-GreenPlum (commercial), Cloudera / Yahoo (opensource) have created VMware instances with Hadoop, Hive, bundled into the VM and and they claim it works out of the box. Yahoo VM partially worked in my machine but it is outdated, it does not integrate with Spring. Cloudera VM did not work in my machine because of some 64bit conflicts.
  3. I got another VM instance from Cloudera for 32bit and it worked. This is a Ubuntu VM instance with all the above tools installed and preconfigured.

I started with Option 3, you can start the VM and do some quick tests as described in the tutorial. If you are in a real hurry, you can open the terminal and run this commands,

cd /usr/lib/hadoop
hadoop jar hadoop-examples.jar pi 10 1000000

Good luck, you ran your 1st Hadoop job.

Now in the same VM download Gradle and SpringData-Hadoop Installation. Unzip both of these in your Cloudera home directory. Go to your .profile file and Add the below line in the end,

export PATH=$PATH:/user/cloudera/gradle-1.0-rc-3/bin

Note your Gradle version maybe different and you should change it accordingly.

Now go to <SpringData-Hadoop Home>/samples/batch-wordcount and open build.gradle file and remove the repositories entries and add the following lines,

repositories {
// Public Spring artefacts
maven { url "" }
maven { url "" }
maven { url "" }
maven { url "" }
maven { url "" }
maven { url "" }
maven { url "" }

Open <SpringData-Hadoop Home>/samples/batch-wordcount/ and modify

hadoopVersion = 0.20.2-cdh3u3

Open <SpringData-Hadoop Home>/samples/batch-wordcount/src/main/resources/ and edit the below lines


Now go to command prompt and run gradle test, the test will be successful. Here is the documentation/tutorial on Spring Hadoop integration

If you want to learn more about Hadoop, there are good tutorials from Cloudera and YDN, please go thru it.

Jumpstart Activiti BPM, Spring MVC and Maven

The source code for this can be found here.

Recently I had a opportunity to try Activiti, an opensource BPM. If you are from Java background, this is a good tool to help us learn concepts like BPM, Workflow, BPMN etc. It has very good Spring integration support. It support BPMN 2.0 standards. In this example, there are lots of JUnit testcases which runs out of the box with just maven, which can demo these concepts.

Just to give you some background, couple of years back, we had written a web application using Spring MVC for Supply Chain Automation, where the application flow was driven by the workflow rules defined outside of the application. While exploring this tool I also realized it is a perfect candidate for externalizing the user flows/rules similar to what we had written in Spring MVC. One of the other competing opensource tools is jBPM Drools.

BPM is a tool which can model a Business process to be used for 2 main purpose,

  • A BPM Business process that can be triggered by a web application to do a order process, which can touch multiple application (A2A) to complete the order
  • Another most important feature of a Business process modeling is in interacting with an user to complete a set of approval process, in the form of usertasks

In this article I will emphasize on 2nd point, I will demonstrate a Loan Processing Application using Activiti, BPMN 2.0, Spring ROO MVC and Maven. This code base has 2 projects,

  • Activiti Junit testcases ported to Maven: This helps you to learn the features of Activity. To quickstart go to this folder and run mvn test, if you see all the tests pass, import to STSIDE and run in debug mode and learn how it works. The prerequisites for this are maven and STS IDE, it does not need Activiti installation, it will work with Activiti in memory engine.
  • Loan Processing Application using Activiti, Spring ROO MVC, Maven: This example is built on top of Roo. In this example I demonstrate a process where loan is submitted by a user, the manager can claim the loan approval task and approve the loan. Activiti and most of other BPM tool does not need you to write external web application to control the flow, the tool themselves support basic UI flow. I will also demo how we can build basic UI for submitting a document and someone verify it and approving it. The prerequisites for this is maven, STS IDE, Activiti, ant, Spring Roo

Activiti BPM Details:

Let us download Activiti BPM and unzip the installation. Let us run ant demo.start from <activiti install>/setup folder.

In approach 1, open a browser and type http://localhost:8080/activiti-explorer, login as kermit/kermit. Deploy the bpmn.xml file located @ <svn location>/src/main/resources/org/activiti/spring/test/usertask1/LoanProcess1.bpmn20.xml . You will get user form in Activiti explorer where you can submit and approve the loan.

In approach 2, Deploy the bpmn.xml file located @ <svn location>/src/main/resources/org/activiti/spring/test/usertask/LoanProcess.bpmn20.xml . I have built a Roo based Spring MVC application where I did the “Push In” to generate the controller code LoanRequestController and added  the below code,

private String startProcess(LoanRequest loanRequest, HttpServletRequest request) {
  if (request.isUserInRole(submitterRole)) {
    logger.debug("in the startProcess ");
    // Get Activiti services
    // Create Activiti process engine
    ProcessEngine processEngine = ProcessEngineConfiguration.createStandaloneProcessEngineConfiguration().buildProcessEngine();
    RuntimeService runtimeService = processEngine.getRuntimeService();
    taskService = processEngine.getTaskService();
    ProcessInstance processInstance = runtimeService.startProcessInstanceByKey("loanProcess");
    logger.debug("startProcess processInstance Id=" + processInstance.getId());
  return "";
private void approveProcess(LoanRequest loanRequest, HttpServletRequest request) {
  // TODO Auto-generated method stub
  logger.debug("in the approveProcess ");
  if (!loanRequest.getProcessId().isEmpty() &amp;&amp; request.isUserInRole(approverRole)) {

Run the application by calling mvn -Dmaven.tomcat.port=8181 clean tomcat:run @/loanrequest folder. To test, login as kermit/kermit and create a loan request and submit. Now login as gonzo/gonzo and notice in activiti explorer the new process was started with new task “Verify loan request”. Now you go back to you application and login in as gonzo/gonzo and open the loan request and update it. It will get approved..

To learn more, run mvn test and understand how the process flow works. The spring configuration is as below,

 <bean id="dataSource-activiti">
<property name="driverClassName" value="org.h2.Driver" />
<property name="url" value="jdbc:h2:tcp://localhost/activiti" />
<property name="username" value="sa" />
<property name="password" value="" />
<property name="defaultAutoCommit" value="false" />
<bean id="processEngineConfiguration">
<property name="dataSource" ref="dataSource-activiti" />
<property name="databaseSchemaUpdate" value="true" />
<property name="transactionManager" ref="transactionManager" />
<property name="jpaHandleTransaction" value="false" />
<property name="jpaCloseEntityManager" value="false" />
<property name="jobExecutorActivate" value="false" />
<bean id="transactionManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager">
<property name="dataSource" ref="dataSource-activiti" />
<bean id="processEngine"> <property name="processEngineConfiguration" ref="processEngineConfiguration" /> </bean>
<bean id="repositoryService" factory-bean="processEngine" factory-method="getRepositoryService" />
<bean id="runtimeService" factory-bean="processEngine" factory-method="getRuntimeService" />
<bean id="taskService" factory-bean="processEngine" factory-method="getTaskService" />
<bean id="historyService" factory-bean="processEngine" factory-method="getHistoryService" />
<bean id="managementService" factory-bean="processEngine" factory-method="getManagementService" />

Quick look at the pom file will have below 3 additional dependencies,

</dependency> <dependency>