Monthly Archives: May 2012

Cloudfoundry and MongoDB NoSQL sample application

Introduction: MongoDb NoSQL

Download the source code here.

As lot of you folks, I also had a question, what is cloud computing? I started googling and downloaded few tools and played with them and understood few concepts.

In this section I will be discussing about one of the key concepts of Cloud computing – Platform as a Service a.k.a PaaS a.k.a Cloud Platform. Basically a Cloud Platform provide development support in a local environment, and deployment into remote environment and the cloud platform will “introspect” to which webserver/which database the application has to run. Let me illustrate this with a diagram

MongoDb NoSQL

There are few leading players in this space Salesforce.com (Heroku, Force.com) and Microsoft (Azure). Recently VMware entered into this space with their own Cloudfoundry. Coming from Java background this is a good tool to understand the details of a Cloud Platform. It is well integrated with STS IDE and it is bit buggy, but there are workarounds.

This section discusses about,

  1. How Cloudfoundry Cloud Platform supports, WebServer and database
  2. How Cloudfoundry Cloud Platform helps in Database seeding
  3. Deploying to Cloud platform
  4.  ‘glue’ to inform the Cloud Platform about Webserver and Database
  5.  ‘glue’ to generate the dbschema and the Seed data

Details

I will walk you thru a simple example using Spring MVC and MongoDB, where you do a basic CRUD operation on person table. MongoDB is a Document based database, which is used to store large amount of data, this also has Map/Reduce capabilities similar to Hadoop.

To quickly start on this,

Cloudfoundry support:

WebServer and database support: Cloudfoundry Supports SpringSource tc Server, it also supports Jetty if used with Maven. Database side,It supports MySQL, vPostgres and MongoDB. It has the ability to introspect spring context file and understand which type of database application supports if you have a bean with type “org.apache.commons.dbcp.BasicDataSource” and it can bind it to the respective database. If it is MongoDB, it needs a mongo-db-factory as shown in this example.

Database seed data population: Typically if you have “jdbc:initialize-database” configuration in your application Cloudfoundry will execute that script in its bound database.

Configuration in the application:

Deploying to Cloudfoundry: There are 2 ways of deploying the code to Cloudfoundry, command line and STS IDE

If you want to deploy and Run the application using VMC command line, you need to do the following,

Move to the target folder
vmc target http://api.{instancename}.cloudfoundry.me

vmc push

give the application name as ‘spring-mongodb’

Bind to ‘mongodb’

Save configuration

Now open the browser and type ‘http://spring-mongodb.{instancename}.cloudfoundry.me/’

If you want to deploy and run the application from STS IDE, you need to do the following,

  • For setting up your STS to work with Cloudfoundry refer this link.
  • Import the maven project into you STS
  • Create a new Cloudfoundry Server and add the spring-mongodb application, and publish the application war to the Cloudfoundry host and see your changes.
  • You can also access the Remote System for error logs and files as mentioned in the SpringSource Blog mentioned above

Glue to inform the Cloudfoundry about the database:

The key changes you have to make in your application to work with Cloudfoundry,

  1. Maven changes
<!-- CloudFoundry -->

  org.cloudfoundry
  cloudfoundry-runtime
  ${org.cloudfoundry-version}

  1. Mongodb configuration

Glue to generate the dbschema and the Seed data in Cloudfoundry:

You can create a Bean called InitService with a method init, and add all the seed data that is needed for the application as below,

public class InitService {
 private MongoTemplate mongoTemplate;
 public MongoTemplate getMongoTemplate() {
   return mongoTemplate;
 }
 public void setMongoTemplate(MongoTemplate mongoTemplate) {
  this.mongoTemplate = mongoTemplate;
 }
private void init() {
Person p = new Person ();
 p.setId(UUID.randomUUID().toString());
 p.setFirstName("John");
 p.setAge(25);
 mongoTemplate.save(p);
 p = new Person ();
 p.setId(UUID.randomUUID().toString());
 p.setFirstName("Jane");
 p.setAge(20);
 mongoTemplate.save(p);
 ..
}

In the bean context, define a bean as below,


Other than these changes, everything else is same as any other Spring MVC application.

Way forward, you can get the Cloudfoundry samples.  Using Git utlitiy you can clone this in your local machine. There is a simple hello-spring-mysql sample; you can quickly understand how MySQL based application works using this example.

Advertisements

Building smoke test harness to test Webservices using SoapUI

We are currently working on a large enterprise integration project. The Integration is developed on a Services Oriented Architecture(SOA), where each application exposes itself as a Webservices. We finished our development cycle and when we migrated the code from development to QA environment, it was very difficult to make sure all the webservices are up and giving a valid response, there were close to 200 services. Mostly the test was done manually.

We went back to the drawing board and analysed how we can automate this. SoapUI, JUnit, Maven and Jenkins came to our rescue. The goals of this implementation is as below,

  • Build SoapUI projects to verify if the services is up and it is responding with meaningful message and we assert the response
  • Parameterizing/externalizing the host name, username and password so that we can run the tests on any environments
  • This can be even tested against live data/production system, hence we should never modify any real data
  • Having a easy web interface to test the interfaces

SoapUI project

  • Create a SoapUI project and Add WSDL (Ctrl U), Create Request and Create a Test Suite, Create Test case and Test Step
  • Bind the webservice method to the Test Step
  • Hardcode the test values to the Test step
  • Assert the response by Adding an assert
  • If the Webservice needs authentication, and if you want to parameterize the user name and password, you need pass ${#Project#username} and ${#Project#password}
  • Finally save the project

JUnit and Maven

You need to create a Maven/Java project and add the below dependency

<dependency>
  <groupId>eviware</groupId>
  <artifactId>maven-soapui-plugin</artifactId>
  <version>4.5.0-SNAPSHOT</version>
</dependency>

In your junit class you need to write the below code,

public void testWebService() {
  SoapUITestCaseRunner runner = new SoapUITestCaseRunner();   runner.setProjectFile(soapUIProjectFile);
  String soapUIProjectFile = System.getProperty("soapUIProjectFile");
  String hostName = System.getProperty("hostName");
  String userName = System.getProperty("userName");
  String password = System.getProperty("password");
  if (hostName != null && hostName.trim().length() > 0) {
    runner.setHost(hostName);
  }
  String[] properties = new String[2];
  properties[0] = (password != null && password.trim().length() > 0) ? "pwd="     + password     : "";
  properties[1] = (userName != null && userName.trim().length() > 0) ? "username="     + userName     : "";
  runner.setProjectProperties(properties);
  runner.run();
  return runner.getFailedTests();
}

Integrating with Jenkins

Create a new Maven Jenkins job, and parameterize the above 4 values as below,

mvn -DsoapUIProjectFile=${soapUIProjectFile} -DhostName=${hostName} -DuserName=${userName} -Dpassword=${password}

2 points to note,

  • Make sure these parameters soapUIProjectFile, hostName, userName, password are created in the Jenkins
  • soapUIProjectFile are checked in the same unit test project and the relative path of the SoupUI project is provided in Jenkins

That is all needs to be done, once it is in place, you just need to run the Jenkins project, it will prompt you with this information, if you pass the values, it will test the webservices in that environment.

We can also enhance this to,

  • Test all the projects in a folder in a recursive way
  • Get meaningful message from junit to inform how many webservices have passed and how many have failed and why they have failed

Setting up a jPetstore Spring MVC application with SQLFire

In this section I will be giving a brief introduction to SQLFire. And I will quickly demonstrate petstore ecommerce application built using Spring MVC, QueryDsl and SQLFire as the backend database.

SQLFire is a distributed database from VMware which can be used to store large amount of relational data using commodity computers and it is one of their peices of puzzle for BigData initiative. It supports JDBC drivers and Peer drivers. In this session, I will create a 2 cluster servers. For more information visit the documentation.

To start the demo,

  • Download SQLFire
  • Download the source code and unzip it
  • Run the command “java –jar vFabric_SQLFire_102_Installer.jar”
  • Start 2 node clusters by doing the following,
    • Under the directory where SQLFire folder was installed create another two directories: server1 and server2.
    • Using sqlf command start two SQLFire server instances that should reside inside the folders that were just created:
      • bin\sqlf server start -dir=server1 -client-port=1527 -mcast-port=12333
      • bin\sqlf server start -dir=server2 -client-port=1528 -mcast-port=12333
  • Run the sqlfire client by executing
    • bin\sqlf
    • connect client ‘localhost:1527’;
    • run ‘<sourcecode download location>\jpetstore\db\ sqlfire\jpetstore-sqlfire-dataload.sql’
    • run ‘<sourcecode download location>\jpetstore\db\ sqlfire\jpetstore-sqlfire-schema.sql’
  • Install maven on your machine and setup the path

Open pom.xml and replace

<dependency>
  <groupId>com.vmware.sqlfire</groupId>
  <artifactId>sqlfire-client</artifactId>
  <version>1.0.0-Beta</version>
</dependency>

With

<dependency>
  <groupId>com.vmware.sqlfire</groupId>
  <artifactId>sqlfireclient</artifactId>
  <version>1.0.2</version>
</dependency>

And in the repositories block add,

<repository>
  <id>gemstone</id>
  <name>Release bundles for SQLFire and GemFire</name>
  <url>http://dist.gemstone.com.s3.amazonaws.com/maven/release</url>
</repository>

Add a new block under project element,

<pluginRepositories>
  <pluginRepository>
    <id>gemstone</id>
    <name>Release bundles for SQLFire and GemFire</name>
    <url>http://dist.gemstone.com.s3.amazonaws.com/maven/release</url>
  </pluginRepository>
</pluginRepositories>

Run ‘mvn tomcat:run’

Now open the browser and type ‘http://localhost:8080/jpetstore’

Volla you have the application up and running. Open eclipse IDE or STS IDE, import maven project into the IDE to see how the application has been developed. You can also see how SQLFire as been used and the JDBC connection.

Datawarehouse implementation using Hadoop+Hbase+Hive+SpringBatch – Part 2

The svn codebase for this article is here.

In continuation to part 1, this section covers,

  1. Setup of a Hadoop, Hbase, Hive on a new Ubuntu VM
  2. Run Hadoop, Hbase and Hive as services
  3. Setup the Spring batch project and run the tests
  4. Some useful commands/tips

To begin with let me tell you the choice of using Hive was to understand not to use Hive as a JDBC equivalent. It was more to understand how to use Hive as a powerful datawarehouse analytics engine.

Setup of a Hadoop, Hbase, Hive on a new Ubuntu VM

Download the latest Hadoop, Hbase and Hive from the apache websites. You can also go to Cloudera website and get the Cloudera UBuntu VM and use apt-get install hadoop, hbase and hive. It did not work for me, if you are adventurous you can try that. You can also try MapR’s VMs. Both Cloudera and MapR have good documentation and tutorials.

Unzip the file in the home directory and go to .profile file and add the bin directories to the path as below,

export HADOOP_HOME=<HADOOP HOME>
export HBASE_HOME=<HBASE HOME>
export HIVE_HOME=<HIVE HOME>
export PATH=$PATH:$HADOOP_HOME/bin:$HBASE_HOME/bin:$HIVE_HOME/bin

sudo mkdir -p /app/hadoop/tmp

sudo chown <login user>:<machine name>/app/hadoop/tmp

hadoop namenode -format

Set HADOOP_HOME, HBASE_HOME and HIVE_HOME environment variables.
Run the ifconfig and get the ip address it will be something like, 192.168.45.129

Go to etc/hosts file and add an entry like,

192.168.45.129 <machine name>

Run Hadoop, Hbase and Hive as services

Go to hadoop root folder and run the below command,

start-all.sh

Open a browser and access http://localhost:50060/ it will open the hadoop admin console. If there are some issues, execute below command and see if there are any exceptions

tail -f $HADOOP_HOME/logs/hadoop-<login username>-namenode-<machine name>.log

Hadoop is running in 54310 port by default.
Go to hbase root folder and run the below command,

start-hbase.sh
tail -f $HBASE_HOME/logs/hbase--master-.log

See if there are any errors. Hbase is runing in port 60000 by default
Go to hive root folder and run the below command,

hive --service hiveserver -hiveconf hbase.master=localhost:60000 mapred.job.tracker=local

Notice, by giving the hbase reference we have integrated hive with hbase. Also hive default port is 10000. Now run hive as a command line client as follow,

hive -h localhost

Create the seed table as below,

  CREATE TABLE weblogs(key int, client_ip string, day string, month string, year string, hour string,
    minute string, second string, user string, loc  string) row format delimited fields terminated by '\t';

  CREATE TABLE hbase_weblogs_1(key int, client_ip string, day string, month string, year string, hour string,
    minute string, second string, user string, loc  string)
    STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, cf1:client_ip, cf2:day, cf3:month, cf4:year, cf5:hour, cf6:minute, cf7:second, cf8:user, cf9:loc")
    TBLPROPERTIES ("hbase.table.name" = "hbase_weblog");

  LOAD DATA LOCAL INPATH '/home/hduser/batch-wordcount/weblogs_parse1.txt' OVERWRITE INTO TABLE weblogs;

  INSERT OVERWRITE TABLE hbase_weblogs_1 SELECT * FROM weblogs;

Setup the Spring batch project and run the tests
To setup this project get the latest code from SVN mentioned in the beginning. Download gradle and setup the path in the .profile. Now run the below command to load the data,

  gradle -Dtest=org.springframework.data.hadoop.samples.DataloadWorkflowTests test

Run the below junit to get the analysis data,

  gradle -Dtest=org.springframework.data.hadoop.samples.AnalyzeWorkflowTests test

hadoopVersion is 1.0.2. build.gradle file looks as below,

repositories {     // Public Spring artefacts
  mavenCentral()
  maven { url "http://repo.springsource.org/libs-release" }
  maven { url "http://repo.springsource.org/libs-milestone" }
  maven { url "http://repo.springsource.org/libs-snapshot" }
  maven { url "http://www.datanucleus.org/downloads/maven2/" }
  maven { url "http://oss.sonatype.org/content/repositories/snapshots" }
  maven { url "http://people.apache.org/~rawson/repo" }
  maven { url "https://repository.cloudera.com/artifactory/cloudera-repos/"}
}

dependencies {
  compile ("org.springframework.data:spring-data-hadoop:$version")
  { exclude group: 'org.apache.thrift', module: 'thrift' }
  compile "org.apache.hadoop:hadoop-examples:$hadoopVersion"
  compile "org.springframework.batch:spring-batch-core:$springBatchVersion"
  // update the version that comes with Batch
  compile "org.springframework:spring-tx:$springVersion"
  compile "org.apache.hive:hive-service:0.9.0"
  compile "org.apache.hive:hive-builtins:0.9.0"
  compile "org.apache.thrift:libthrift:0.8.0"
  runtime "org.codehaus.groovy:groovy:$groovyVersion"
  // see HADOOP-7461
  runtime "org.codehaus.jackson:jackson-mapper-asl:$jacksonVersion"
  testCompile "junit:junit:$junitVersion"
  testCompile "org.springframework:spring-test:$springVersion"
}

Spring Data Hadoop configuration looks as below,

<configuration>
<!-- The value after the question mark is the default value if another value for hd.fs is not provided -->
  fs.default.name=${hd.fs:hdfs://localhost:9000}
  mapred.job.tracker=local</pre>
</configuration>

<hive-client host="localhost" port="10000" />

Spring Batch job looks as below,

<batch:job id="job1">
  <batch:step id="import">
    <batch:tasklet ref="hive-script"/>
  </batch:step>
</batch:job>

Spring Data Hive script for loading the data is as below,

<hive-tasklet id="hive-script">
  <script>
LOAD DATA LOCAL INPATH '/home/hduser/batch-analysis/weblogs_parse.txt' OVERWRITE INTO TABLE weblogs;
INSERT OVERWRITE TABLE hbase_weblogs_1 SELECT * FROM weblogs;
</script>
</hive-tasklet>

Spring Data Hive script for analyzing the data is as below,

<hive-tasklet id="hive-script">
<script>    SELECT client_ip, count(user) FROM hbase_weblogs_1 GROUP by client_ip;   </script>
</hive-tasklet>

Some useful commands/tips

For querying hadoop dfs you can use any file based unix commands like,

hadoop dfs -ls /
hadoop dfs -mkdir /hbase

If you have entered safemode in hadoop and it is not starting up you can execute below command,

hadoop dfsadmin -safemode leave

If you want find some errors in the file hadoop filesystem you can execute below command,

hadoop fsck /

Datawarehouse implementation using Hadoop+Hbase+Hive+SpringBatch – Part 1

In continuation of my earlier article, I proceeded a little more in understanding of what are these technologies and where are these used and a technical implementation of these. I want present this material into 2 parts,

  1. Introduction to some of the terminologies and explaining the use case we are going to implement
  2. Implementation details of the use case

Hadoop is primarily a distributed file system (DFS) where you can store terabytes/petabytes of data on low end commodity computers. This is similar to how companies like Yahoo and Google store their page feeds.

Hbase is a BigTable implementation which can either use file system or Hadoop DFS to store the data. It is similar to RDBMS like Oracle. There are 2 main differences between,

  1. It is designed to store sparse data, use case is where there are lots of columns and there are no data stored in majority of these columns
  2. It has a powerful versioning capabilities, a typical use case it to track the trajectory of a rocket over a period of time

Hive is built on the Map Reduce concept and is built on top of Hadoop. It supports SQL like query language. Its main purpose is datawarehouse analysis and adhoc querying. Hive can also be built on top of Hbase.

Spring Batch is a Spring framework for managing a batch workflow.

Usecase:

Let us say you have a website which has lot of traffic. We need to build an application which analyses the traffic like number of times the user has come to the website. Which location the website is accessed most. Below is the data flow,

Log file feed structure,

192.168.45.129 07:45
192.168.45.126 07:46
192.168.45.127 07:48
192.168.45.129 07:49

There will be a lookup table in Hbase as below,

192.168.45.129 kp Boston
192.168.45.126 kk Bangalore
192.168.45.127 ak Bangalore

Now we write 2 applications:

  1. Spring batch will run a command line to pull the log feed and push the ip address feed to hbase
  2. Java commandline will pull the hbase information using hive to display the metrics as follows

presentation useranalysis
presentation locationanalysis

Now that you know the technology pieces and high level solution, in my next article I will write how we can implement this with working code.

Roo vs Grails

We had a requirement where we had an application in production and was developed in Spring MVC and the velocity of implementing new features were slow. We had to explore alternate frameworks like Grails and Roo. I personally spent close to a month exploring which one was better.

We explored following features in both these and below is our findings. We explored following features in both these and below is our findings. The final verdict is we have still not finilized on either one, we are still exploring

  • Support for database migration from existing application to these frameworks
  • Support for Spring Security with existing user authentication/authorization model for these frameworks
  • Support for standard features like reporting
  • Support for ajax frameworks like JQuery, Dojo, GWT, Flex

Both of these frameworks are well integrated with STS IDE. If you have to use these framework, configure STS IDE and start developing, dont use command line, you cannot take advantage of roundtrip code generation capabilities.

Roo:

This framework is 100% Java roundtrip engineering RAD framework which is used to develop any Spring application including web application using Spring MVC, and Ajax based views. The deal breaker here was, Roo was very buggy. It has addon for lot of tools like Jasper/ JQuery/Flex etc.. but most of them wont work out of the box with 1.2.1.RELEASE. It is wasy to build a simple prototype but build or reverse engineering enterprise web appliction is not easy. One aspect favorable to it is since it is Java based, people who understand Spring MVC, Spring Webflow etc, can take advantage of it.

Support for database migration from existing application to these frameworks:

In Spring Roo 1.2.1.Release database reverse engineering the database to DAO, DTO comes out of the box, the roo command is as below,

database reverse engineer --schema PUBLIC

Support for Spring Security with existing user authentication/authorization model for these frameworks

When you run the command security setup it will provide all the security plumbing including a login page and default spring security context file.

If you have to customize it, you need to configure in the spring applicationContext-security.xml file as below,

<http auto-config="true" use-expressions="true">
	<form-login login-processing-url="/resources/j_spring_security_check" login-page="/login" authentication-failure-url="/login?login_error=t" />
	<logout logout-url="/resources/j_spring_security_logout" />
	<!-- Configure these elements to secure URIs in your application -->
	<intercept-url pattern="/users/**" access="hasRole('admin')"/>
	<intercept-url pattern="/roles/**" access="hasRole('admin')"/>
	<intercept-url pattern="/userrolesmaps/**" access="hasRole('admin')"/>
	<intercept-url pattern="/" access="isAuthenticated()" />
</http>
<!-- Configure Authentication mechanism -->
<authentication-manager alias="authenticationManager">
<!-- SHA-256 values can be produced using 'echo -n your_desired_password | sha256sum' (using normal *nix environments) -->
	<authentication-provider>
		<jdbc-user-service data-source-ref="dataSource"     users-by-username-query="select email_id, password, active
		from custom_user where email_id=?"
		authorities-by-username-query="select u.email_id, r.name from custom_user u, custom_role r, custom_user_roles_map ur
		where u.id = ur.user_id and ur.role_id=r.id and u.email_id=?  " />
	</authentication-provider>
</authentication-manager>

Support for standard features like reporting

There is already an addon in Roo for Jasper reporting from gvnix, sadly this did not work out of the box, I had to write a new controller to plug into Jasper using Maven and write ton of code

Support for ajax frameworks like JQuery, Dojo, GWT, Flex

Again I was not able to make Flex work with Roo out of the box. Dojo comes bundled with Roo out of the box and it is Ajax enabled. Roo also works with GWT, but making Spring Security work with GWT was a challenge. There is no addon to JQuery.

Grails:

This framework is Groovy based, there is a learning curve involved for Java / Spring MVC developers to learn this new langauge, but let me tell you it is worth an effort. This is purely a web development platform. And among these 2 frameworks, this one is more stable and has better overall addon support with minimum bugs. There is also good JQuery /Ajax support.

Support for database migration from existing application to these frameworks:

There is a db-reverse-engineer addon, which can create a whole set of plumbing like Dto, Dao etc in Groovy

Support for Spring Security with existing user authentication/authorization model for these frameworks

It does provide spring security support with sample user authentication groovy code ready to be use, but if you want to configure it to work with existing schemas, it was not easy. And there is not lot of documents on the web to show how to implement

Support for standard features like reporting

Still exploring

Support for ajax frameworks like JQuery, Dojo, GWT, Flex

Still exploring

Jumpstart Grails with real life example: Agile Tracking Tool

Recently I wanted to tryout Grails. I was trying out lot of different sample/examples/opensource projects. With the new Grails 2.0.3 upgrades all these applications are not working out of the box including Grails sample. Finally I narrowed down to on opensource project – Agile Tracking Tool. Within an hr or so I was able to make this work on Grails 2.0.3. It clearly demonstrates aspects like,

  • Spring Security
  • Graphs and Charts
  • Ajax using prototype framework

I am also contributing to this code base, by making it working with Latest Grails and I am currently fixing Authentication/Authorization with roles like, admin, scrummaster, developer etc..

To quickly jump start setting up the environment,

  • Download the latest grails @ http://www.grails.org/download/file?mirror=212
  • This assumes that you have Java 6 installed on your machine, and add <grails installation folder>/bin folder to the path
  • Get the latest code @ https://agiletrackingtool.googlecode.com/svn/trunk/ using tortoise svn
  • open the command prompt and go to the folder you just now got the code and run grails run-app
  • Open the browser and type in http://localhost:8080/AgileTrackingTool/
  • Login in as scrummaster/scrummaster and load the sample data by clicking the link “Load example project data”, also notice you dont have admin link
  • Once you load the data, you can see some burndown charts etc..
  • Login in as admin/admin, you get the admin link

Next step is to import this project in STS IDE as a Grails project and analysing how charts work, how ajax work etc…