Tuesday, December 3, 2013

Install Storm Cluster on Mac

In this blog, I will show how to install Storm cluster on Mac and some of the challenges I ran into and how to troubleshoot and resolve those issues. In the next blog, I will show how to use RabbitMQ spout and write the word count result to Redis.

Storm's Major Components

Storm consists of four major components,

  1. Zookeeper
  2. zeroMQ
  3. jzmq (java bridge for zeroMQ)
  4. Storm
Start here if one wants to read more about Storm.

Installing Storm in Development Environment

Follow the instruction here if one wants to set up Storm in development environment.

Install Storm Cluster on Mac

Our challenge, however is to install storm cluster on a single Mac. Here is a good article on how to accomplish that, with one single exception. Instead of installing jzmq mentioned in the article, one must install this branch of jzmq for Mac.

More on this topic in later sections.

Start and Test Storm Cluster

Start ZooKeeper

sudo ./zkServer.sh start

Start Storm Cluster

Depending on where one configures storm to store its temporary data, one might need to start storm through sudo.

Start Nimbus

 sudo ./storm nimbus

Start Supervisor

sudo ./storm supervisor

Start UI

sudo ./storm ui

Check Storm UI and Verify Storm cluster is up and Running

Go to http://localhost:8080 and one should see the following screen,


Build and Deploy Storm-Starter to Storm Cluster

Download Storm-Starter

Download storm-starter from Githut,
git clone https://github.com/nathanmarz/storm-starter.git

Build and Package Storm-Starter

cd storm-starter
mvn  -f m2-pom.xml package

This will produce storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar under target directory.

Deploy and Run a Topology

storm jar storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar storm.starter.WordCountTopology WordCountTopology

If everything deploys correctly, one should see the following messages,
  [main] INFO backtype.storm.StormSubmitter - Jar not uploaded to master yet. Submitting jar...  
 10  [main] INFO backtype.storm.StormSubmitter - Uploading topology jar storm-starter-0.0.1-SNAPSHOT.jar to assigned location: /usr/local/var/run/zookeeper/data/nimbus/inbox/stormjar-19e2b1f9-df68-4105-b409-b4190d3d4efa.jar  
 76  [main] INFO backtype.storm.StormSubmitter - Successfully uploaded topology jar to assigned location: /usr/local/var/run/zookeeper/data/nimbus/inbox/stormjar-19e2b1f9-df68-4105-b409-b4190d3d4efa.jar  
 76  [main] INFO backtype.storm.StormSubmitter - Submitting topology wordCountToplogy in distributed mode with conf {"topology.workers":3,"topology.debug":true}  
 431 [main] INFO backtype.storm.StormSubmitter - Finished submitting topology: wordCountToplogy  

Now, go back to storm ui and we should see the newly deployed topology shows up.




Even though the topology shows up in the ui, this doesn't mean storm is working properly. We need to drill down at the actual topology level to verify that storm is working properly by verifying that the number of emitted and transformed messages are greater than zero.

Stop or kill a Topology

./storm kill wordCountTopology


Where things could go wrong

There are many things could go wrong when setting up a storm cluster, from incompatible zeroMQ/jzmq, to file permission issue that can cause countless hours of frustration and searching the Internet. Here are some of the problems I ran into and how I managed to resolve them.

Log files are your Friend

The log files for nimbus, supervisor, and ui are placed under STORM_HOME/logs directory. In addition to nimbus.log, supervisor.log, ui.log, one should find worker-6700.log, worker-6701.log, worker6702.log etc. Go through these log files to make sure there is no error or exception in the log files. If there is a file permission related error, one should be able to spot it in one of the log files.

zeroMQ and jzmq compatibility

The other hard to track down issue is mostly related to zeroMQ and jzqm bridge. 

Only Use zeroMQ version 2.1.7

If one gets an invalid parameter exception for zeroMQ, the wrong zeroMQ version is used.

Only Use the Correct jzmq for Mac

 Instead of installing jzmq mentioned in the article, one must install this branch of jzmq for Mac.

What happens when my worker thread keeps crashing

If the supervisor keeps reporting back that the worker thread keeps getting killed, and there is no exception in any of the log files, please check for hs_err_pid.log (pid is the process id) under STORM_HOME/bin directory or the directory where storm is launched. If one finds hs_err_pid.log files, chances are there is some incompatibility between the zeroMQ server and jzmq bridge. The hs_err_pid.log should have all the details.

The other options is to manually start the worker command by hand and see whether it can start up or crashes.