Tuesday, December 3, 2013

Install Storm Cluster on Mac

In this blog, I will show how to install Storm cluster on Mac and some of the challenges I ran into and how to troubleshoot and resolve those issues. In the next blog, I will show how to use RabbitMQ spout and write the word count result to Redis.

Storm's Major Components

Storm consists of four major components,

  1. Zookeeper
  2. zeroMQ
  3. jzmq (java bridge for zeroMQ)
  4. Storm
Start here if one wants to read more about Storm.

Installing Storm in Development Environment

Follow the instruction here if one wants to set up Storm in development environment.

Install Storm Cluster on Mac

Our challenge, however is to install storm cluster on a single Mac. Here is a good article on how to accomplish that, with one single exception. Instead of installing jzmq mentioned in the article, one must install this branch of jzmq for Mac.

More on this topic in later sections.

Start and Test Storm Cluster

Start ZooKeeper

sudo ./zkServer.sh start

Start Storm Cluster

Depending on where one configures storm to store its temporary data, one might need to start storm through sudo.

Start Nimbus

 sudo ./storm nimbus

Start Supervisor

sudo ./storm supervisor

Start UI

sudo ./storm ui

Check Storm UI and Verify Storm cluster is up and Running

Go to http://localhost:8080 and one should see the following screen,


Build and Deploy Storm-Starter to Storm Cluster

Download Storm-Starter

Download storm-starter from Githut,
git clone https://github.com/nathanmarz/storm-starter.git

Build and Package Storm-Starter

cd storm-starter
mvn  -f m2-pom.xml package

This will produce storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar under target directory.

Deploy and Run a Topology

storm jar storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar storm.starter.WordCountTopology WordCountTopology

If everything deploys correctly, one should see the following messages,
  [main] INFO backtype.storm.StormSubmitter - Jar not uploaded to master yet. Submitting jar...  
 10  [main] INFO backtype.storm.StormSubmitter - Uploading topology jar storm-starter-0.0.1-SNAPSHOT.jar to assigned location: /usr/local/var/run/zookeeper/data/nimbus/inbox/stormjar-19e2b1f9-df68-4105-b409-b4190d3d4efa.jar  
 76  [main] INFO backtype.storm.StormSubmitter - Successfully uploaded topology jar to assigned location: /usr/local/var/run/zookeeper/data/nimbus/inbox/stormjar-19e2b1f9-df68-4105-b409-b4190d3d4efa.jar  
 76  [main] INFO backtype.storm.StormSubmitter - Submitting topology wordCountToplogy in distributed mode with conf {"topology.workers":3,"topology.debug":true}  
 431 [main] INFO backtype.storm.StormSubmitter - Finished submitting topology: wordCountToplogy  

Now, go back to storm ui and we should see the newly deployed topology shows up.




Even though the topology shows up in the ui, this doesn't mean storm is working properly. We need to drill down at the actual topology level to verify that storm is working properly by verifying that the number of emitted and transformed messages are greater than zero.

Stop or kill a Topology

./storm kill wordCountTopology


Where things could go wrong

There are many things could go wrong when setting up a storm cluster, from incompatible zeroMQ/jzmq, to file permission issue that can cause countless hours of frustration and searching the Internet. Here are some of the problems I ran into and how I managed to resolve them.

Log files are your Friend

The log files for nimbus, supervisor, and ui are placed under STORM_HOME/logs directory. In addition to nimbus.log, supervisor.log, ui.log, one should find worker-6700.log, worker-6701.log, worker6702.log etc. Go through these log files to make sure there is no error or exception in the log files. If there is a file permission related error, one should be able to spot it in one of the log files.

zeroMQ and jzmq compatibility

The other hard to track down issue is mostly related to zeroMQ and jzqm bridge. 

Only Use zeroMQ version 2.1.7

If one gets an invalid parameter exception for zeroMQ, the wrong zeroMQ version is used.

Only Use the Correct jzmq for Mac

 Instead of installing jzmq mentioned in the article, one must install this branch of jzmq for Mac.

What happens when my worker thread keeps crashing

If the supervisor keeps reporting back that the worker thread keeps getting killed, and there is no exception in any of the log files, please check for hs_err_pid.log (pid is the process id) under STORM_HOME/bin directory or the directory where storm is launched. If one finds hs_err_pid.log files, chances are there is some incompatibility between the zeroMQ server and jzmq bridge. The hs_err_pid.log should have all the details.

The other options is to manually start the worker command by hand and see whether it can start up or crashes.



 






Friday, February 8, 2013

Use Mule LDAP Connector to Create a New User

In this blog, we will show how to create a new user in openLDAP using Mule LDAP connector.

This flow has been tested against a Mac openLDAP server, using the core schema. Also we need to setup an LDAP tree like ou=people,dc=example,dc=com .

Here is the flow we are using,

  1. Receive create user request via HTTP GET request (only passing in dn, distinguished name)
  2. Look up dn from LDAP and return the lookup result
  3. If the lookup fails by a NameNoFoundException, the Exception flow is invoked
  4. A Java transform pulls out the dn and fills in a set of pre-defined user data (for demo purpose) and puts them in a map
  5. LDAP transformer transforms the map into an LDAPEntry
  6. LDAP connector is called to insert newly created LDAPEntry into LDAP
But first, we need to install LDAP connector by going to Mule Studio/Help/Install New Software. Select "MuleStudio Cloud Connectors Update Site" and search for "ldap". Check "LDAP Connector Mule Studio Extension" and proceed to install it.


Next we need to configure an LDAP global element, by clicking on Global Elements' "Add" button.
Search for "ldap" and select "LDAP" under "Cloud Connectors", and click "OK".

Fill in "Name", "Principal DN", "Password", and "URL", click OK.


Here is the picture of the Mule flow. The normal flow returns if the LDAP exists in the system. The exception flow is called when the lookup fails and the addition of LDAP user is executed.

Here is the Java code for the Java LDAP to Map transformer. This is for demo purpose so a lot of hard coded information is used,

To test this flow, enter the following URL, 
http://localhost:8082/ldap?dn=cn=ldapUser,ou=people,dc=example,dc=com

This causes a new LDAP user to be created under ou=people,dc=example,dc=com.

We can verify this by looking at the Mule flow output,
 log4j: Trying to find [log4j.xml] using context classloader sun.misc.Launcher$AppClassLoader@1feed786.  
 log4j: Trying to find [log4j.xml] using sun.misc.Launcher$AppClassLoader@1feed786 class loader.  
 log4j: Trying to find [log4j.xml] using ClassLoader.getSystemResource().  
 log4j: Trying to find [log4j.properties] using context classloader sun.misc.Launcher$AppClassLoader@1feed786.  
 log4j: Using URL [jar:file:/Users/legu/mule-resources/MuleStudio/plugins/org.mule.tooling.server.3.3.2.ee_1.3.2.201212121942/mule/tooling/tooling-support-3.3.1.jar!/log4j.properties] for automatic log4j configuration.  
 log4j: Reading configuration from URL jar:file:/Users/legu/mule-resources/MuleStudio/plugins/org.mule.tooling.server.3.3.2.ee_1.3.2.201212121942/mule/tooling/tooling-support-3.3.1.jar!/log4j.properties  
 log4j: Parsing for [root] with value=[INFO, console].  
 log4j: Level token is [INFO].  
 log4j: Category root set to INFO  
 log4j: Parsing appender named "console".  
 log4j: Parsing layout options for "console".  
 log4j: Setting property [conversionPattern] to [%-5p %d [%t] %c: %m%n].  
 log4j: End of parsing for "console".  
 log4j: Parsed "console" options.  
 log4j: Parsing for [com.mycompany] with value=[DEBUG].  
 log4j: Level token is [DEBUG].  
 log4j: Category com.mycompany set to DEBUG  
 log4j: Handling log4j.additivity.com.mycompany=[null]  
 log4j: Parsing for [org.springframework.beans.factory] with value=[WARN].  
 log4j: Level token is [WARN].  
 log4j: Category org.springframework.beans.factory set to WARN  
 log4j: Handling log4j.additivity.org.springframework.beans.factory=[null]  
 log4j: Parsing for [org.apache] with value=[WARN].  
 log4j: Level token is [WARN].  
 log4j: Category org.apache set to WARN  
 log4j: Handling log4j.additivity.org.apache=[null]  
 log4j: Parsing for [org.mule] with value=[INFO].  
 log4j: Level token is [INFO].  
 log4j: Category org.mule set to INFO  
 log4j: Handling log4j.additivity.org.mule=[null]  
 log4j: Parsing for [org.hibernate.engine.StatefulPersistenceContext.ProxyWarnLog] with value=[ERROR].  
 log4j: Level token is [ERROR].  
 log4j: Category org.hibernate.engine.StatefulPersistenceContext.ProxyWarnLog set to ERROR  
 log4j: Handling log4j.additivity.org.hibernate.engine.StatefulPersistenceContext.ProxyWarnLog=[null]  
 log4j: Finished configuring.  
 INFO 2013-02-08 16:17:51,201 [main] org.mule.module.launcher.application.DefaultMuleApplication:   
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
 + New app 'ldap_test'                   +  
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
 INFO 2013-02-08 16:17:51,203 [main] org.mule.module.launcher.application.DefaultMuleApplication:   
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
 + Initializing app 'ldap_test'               +  
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
 INFO 2013-02-08 16:17:51,382 [main] org.mule.lifecycle.AbstractLifecycleManager: Initialising RegistryBroker  
 INFO 2013-02-08 16:17:51,546 [main] org.mule.config.spring.MuleApplicationContext: Refreshing org.mule.config.spring.MuleApplicationContext@48ff4cf: startup date [Fri Feb 08 16:17:51 EST 2013]; root of context hierarchy  
 WARN 2013-02-08 16:17:52,585 [main] org.mule.config.spring.parsers.assembly.DefaultBeanAssembler: Cannot assign class java.lang.Object to interface org.mule.api.AnnotatedObject  
 INFO 2013-02-08 16:17:53,363 [main] org.mule.lifecycle.AbstractLifecycleManager: Initialising model: _muleSystemModel  
 WARN 2013-02-08 16:17:53,446 [main] org.springframework.beans.GenericTypeAwarePropertyDescriptor: Invalid JavaBean property 'port' being accessed! Ambiguous write methods found next to actually used [public void org.mule.endpoint.URIBuilder.setPort(java.lang.String)]: [public void org.mule.endpoint.URIBuilder.setPort(int)]  
 INFO 2013-02-08 16:17:53,525 [main] org.mule.lifecycle.AbstractLifecycleManager: Initialising connector: connector.http.mule.default  
 INFO 2013-02-08 16:17:53,591 [main] org.mule.construct.FlowConstructLifecycleManager: Initialising flow: LDAP_ADD_IF_NOT_EXISTS  
 INFO 2013-02-08 16:17:53,591 [main] org.mule.exception.CatchMessagingExceptionStrategy: Initialising exception listener: org.mule.exception.CatchMessagingExceptionStrategy@4cfed6f7  
 INFO 2013-02-08 16:17:53,599 [main] org.mule.processor.SedaStageLifecycleManager: Initialising service: LDAP_ADD_IF_NOT_EXISTS.stage1  
 INFO 2013-02-08 16:17:53,610 [main] org.mule.config.builders.AutoConfigurationBuilder: Configured Mule using "org.mule.config.spring.SpringXmlConfigurationBuilder" with configuration resource(s): "[ConfigResource{resourceName='/Users/legu/MuleStudio/workspace/.mule/apps/ldap_test/LDAP Create USER if not exist.xml'}]"  
 INFO 2013-02-08 16:17:53,610 [main] org.mule.config.builders.AutoConfigurationBuilder: Configured Mule using "org.mule.config.builders.AutoConfigurationBuilder" with configuration resource(s): "[ConfigResource{resourceName='/Users/legu/MuleStudio/workspace/.mule/apps/ldap_test/LDAP Create USER if not exist.xml'}]"  
 INFO 2013-02-08 16:17:53,610 [main] org.mule.module.launcher.application.DefaultMuleApplication: Monitoring for hot-deployment: /Users/legu/MuleStudio/workspace/.mule/apps/ldap_test/LDAP Create USER if not exist.xml  
 INFO 2013-02-08 16:17:53,612 [main] org.mule.module.launcher.application.DefaultMuleApplication:   
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
 + Starting app 'ldap_test'                 +  
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
 INFO 2013-02-08 16:17:53,619 [main] org.mule.util.queue.TransactionalQueueManager: Starting ResourceManager  
 INFO 2013-02-08 16:17:53,621 [main] org.mule.util.queue.TransactionalQueueManager: Started ResourceManager  
 INFO 2013-02-08 16:17:53,623 [main] org.mule.transport.http.HttpConnector: Connected: HttpConnector  
 {  
  name=connector.http.mule.default  
  lifecycle=initialise  
  this=12133926  
  numberOfConcurrentTransactedReceivers=4  
  createMultipleTransactedReceivers=true  
  connected=true  
  supportedProtocols=[http]  
  serviceOverrides=<none>  
 }  
 INFO 2013-02-08 16:17:53,623 [main] org.mule.transport.http.HttpConnector: Starting: HttpConnector  
 {  
  name=connector.http.mule.default  
  lifecycle=initialise  
  this=12133926  
  numberOfConcurrentTransactedReceivers=4  
  createMultipleTransactedReceivers=true  
  connected=true  
  supportedProtocols=[http]  
  serviceOverrides=<none>  
 }  
 INFO 2013-02-08 16:17:53,623 [main] org.mule.lifecycle.AbstractLifecycleManager: Starting connector: connector.http.mule.default  
 INFO 2013-02-08 16:17:53,626 [main] org.mule.module.ldap.agents.DefaultSplashScreenAgent:   
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
 + DevKit Extensions (1) used in this application                +  
 + LDAP 1.0.1 (DevKit 3.3.1 Build UNNAMED.1297.150f2c9)+            +  
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
 INFO 2013-02-08 16:17:53,627 [main] org.mule.lifecycle.AbstractLifecycleManager: Starting model: _muleSystemModel  
 INFO 2013-02-08 16:17:53,628 [main] org.mule.construct.FlowConstructLifecycleManager: Starting flow: LDAP_ADD_IF_NOT_EXISTS  
 INFO 2013-02-08 16:17:53,628 [main] org.mule.processor.SedaStageLifecycleManager: Starting service: LDAP_ADD_IF_NOT_EXISTS.stage1  
 INFO 2013-02-08 16:17:53,634 [main] org.mule.transport.http.HttpConnector: Registering listener: LDAP_ADD_IF_NOT_EXISTS on endpointUri: http://localhost:8082/ldap  
 INFO 2013-02-08 16:17:53,639 [main] org.mule.transport.service.DefaultTransportServiceDescriptor: Loading default response transformer: org.mule.transport.http.transformers.MuleMessageToHttpResponse  
 INFO 2013-02-08 16:17:53,641 [main] org.mule.lifecycle.AbstractLifecycleManager: Initialising: 'null'. Object is: HttpMessageReceiver  
 INFO 2013-02-08 16:17:53,646 [main] org.mule.transport.http.HttpMessageReceiver: Connecting clusterizable message receiver  
 INFO 2013-02-08 16:17:53,650 [main] org.mule.lifecycle.AbstractLifecycleManager: Starting: 'null'. Object is: HttpMessageReceiver  
 INFO 2013-02-08 16:17:53,650 [main] org.mule.transport.http.HttpMessageReceiver: Starting clusterizable message receiver  
 INFO 2013-02-08 16:17:53,652 [main] org.mule.module.launcher.application.DefaultMuleApplication: Reload interval: 3000  
 INFO 2013-02-08 16:17:53,698 [main] org.mule.module.management.agent.WrapperManagerAgent: This JVM hasn't been launched by the wrapper, the agent will not run.  
 INFO 2013-02-08 16:17:53,731 [main] org.mule.module.management.agent.JmxAgent: Attempting to register service with name: Mule.ldap_test:type=Endpoint,service="LDAP_ADD_IF_NOT_EXISTS",connector=connector.http.mule.default,name="endpoint.http.localhost.8082.ldap"  
 INFO 2013-02-08 16:17:53,732 [main] org.mule.module.management.agent.JmxAgent: Registered Endpoint Service with name: Mule.ldap_test:type=Endpoint,service="LDAP_ADD_IF_NOT_EXISTS",connector=connector.http.mule.default,name="endpoint.http.localhost.8082.ldap"  
 INFO 2013-02-08 16:17:53,733 [main] org.mule.module.management.agent.JmxAgent: Registered Connector Service with name Mule.ldap_test:type=Connector,name="connector.http.mule.default.1"  
 INFO 2013-02-08 16:17:53,735 [main] org.mule.DefaultMuleContext:   
 **********************************************************************  
 * Application: ldap_test                       *  
 * OS encoding: MacRoman, Mule encoding: UTF-8            *  
 *                                  *  
 * Agents Running:                          *  
 *  DevKit Extension Information                   *  
 *  Clustering Agent                         *  
 *  JMX Agent                            *  
 **********************************************************************  
 INFO 2013-02-08 16:17:53,736 [main] org.mule.module.launcher.DeploymentService:   
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
 + Started app 'ldap_test'                 +  
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
 INFO 2013-02-08 16:18:29,991 [[ldap_test].connector.http.mule.default.receiver.02] org.mule.module.ldap.ldap.api.jndi.LDAPJNDIConnection: Binded to ldap://localhost with simple authentication as cn=admin,dc=example,dc=com  
 ERROR 2013-02-08 16:18:29,993 [[ldap_test].connector.http.mule.default.receiver.02] org.mule.module.ldap.ldap.api.jndi.LDAPJNDIConnection: Lookup failed.  
 javax.naming.NameNotFoundException: [LDAP: error code 32 - No Such Object]; remaining name 'cn=ldapUser,ou=people,dc=example,dc=com'  
      at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3092)  
      at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3013)  
      at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2820)  
      at com.sun.jndi.ldap.LdapCtx.c_getAttributes(LdapCtx.java:1312)  
      at com.sun.jndi.toolkit.ctx.ComponentDirContext.p_getAttributes(ComponentDirContext.java:213)  
      at com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(PartialCompositeDirContext.java:121)  
      at com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(PartialCompositeDirContext.java:109)  
      at javax.naming.directory.InitialDirContext.getAttributes(InitialDirContext.java:123)  
      at javax.naming.directory.InitialDirContext.getAttributes(InitialDirContext.java:118)  
      at org.mule.module.ldap.ldap.api.jndi.LDAPJNDIConnection.lookup(LDAPJNDIConnection.java:465)  
      at org.mule.module.ldap.LDAPConnector.lookup(LDAPConnector.java:480)  
      at org.mule.module.ldap.processors.LookupMessageProcessor$1.process(LookupMessageProcessor.java:130)  
      at org.mule.module.ldap.process.ProcessCallbackProcessInterceptor.execute(ProcessCallbackProcessInterceptor.java:18)  
      at org.mule.module.ldap.process.ManagedConnectionProcessInterceptor.execute(ManagedConnectionProcessInterceptor.java:63)  
      at org.mule.module.ldap.process.ManagedConnectionProcessInterceptor.execute(ManagedConnectionProcessInterceptor.java:21)  
      at org.mule.module.ldap.process.RetryProcessInterceptor.execute(RetryProcessInterceptor.java:68)  
      at org.mule.module.ldap.connectivity.ManagedConnectionProcessTemplate.execute(ManagedConnectionProcessTemplate.java:33)  
      at org.mule.module.ldap.processors.LookupMessageProcessor.process(LookupMessageProcessor.java:116)  
      at org.mule.execution.ExceptionToMessagingExceptionExecutionInterceptor.execute(ExceptionToMessagingExceptionExecutionInterceptor.java:27)  
      at org.mule.execution.MessageProcessorNotificationExecutionInterceptor.execute(MessageProcessorNotificationExecutionInterceptor.java:46)  
      at org.mule.execution.MessageProcessorExecutionTemplate.execute(MessageProcessorExecutionTemplate.java:43)  
      at org.mule.processor.chain.DefaultMessageProcessorChain.doProcess(DefaultMessageProcessorChain.java:93)  
      at org.mule.processor.chain.AbstractMessageProcessorChain.process(AbstractMessageProcessorChain.java:66)  
      at org.mule.execution.ExceptionToMessagingExceptionExecutionInterceptor.execute(ExceptionToMessagingExceptionExecutionInterceptor.java:27)  
      at org.mule.execution.MessageProcessorExecutionTemplate.execute(MessageProcessorExecutionTemplate.java:43)  
      at org.mule.processor.AbstractInterceptingMessageProcessorBase.processNext(AbstractInterceptingMessageProcessorBase.java:105)  
      at org.mule.processor.AsyncInterceptingMessageProcessor.process(AsyncInterceptingMessageProcessor.java:98)  
      at org.mule.execution.ExceptionToMessagingExceptionExecutionInterceptor.execute(ExceptionToMessagingExceptionExecutionInterceptor.java:27)  
      at org.mule.execution.MessageProcessorNotificationExecutionInterceptor.execute(MessageProcessorNotificationExecutionInterceptor.java:46)  
      at org.mule.execution.MessageProcessorExecutionTemplate.execute(MessageProcessorExecutionTemplate.java:43)  
      at org.mule.processor.chain.DefaultMessageProcessorChain.doProcess(DefaultMessageProcessorChain.java:93)  
      at org.mule.processor.chain.AbstractMessageProcessorChain.process(AbstractMessageProcessorChain.java:66)  
      at org.mule.execution.ExceptionToMessagingExceptionExecutionInterceptor.execute(ExceptionToMessagingExceptionExecutionInterceptor.java:27)  
      at org.mule.execution.MessageProcessorExecutionTemplate.execute(MessageProcessorExecutionTemplate.java:43)  
      at org.mule.processor.AbstractInterceptingMessageProcessorBase.processNext(AbstractInterceptingMessageProcessorBase.java:105)  
      at org.mule.interceptor.AbstractEnvelopeInterceptor.process(AbstractEnvelopeInterceptor.java:55)  
      at org.mule.execution.ExceptionToMessagingExceptionExecutionInterceptor.execute(ExceptionToMessagingExceptionExecutionInterceptor.java:27)  
      at org.mule.execution.MessageProcessorNotificationExecutionInterceptor.execute(MessageProcessorNotificationExecutionInterceptor.java:46)  
      at org.mule.execution.MessageProcessorExecutionTemplate.execute(MessageProcessorExecutionTemplate.java:43)  
      at org.mule.processor.AbstractInterceptingMessageProcessorBase.processNext(AbstractInterceptingMessageProcessorBase.java:105)  
      at org.mule.processor.AbstractFilteringMessageProcessor.process(AbstractFilteringMessageProcessor.java:44)  
      at org.mule.execution.ExceptionToMessagingExceptionExecutionInterceptor.execute(ExceptionToMessagingExceptionExecutionInterceptor.java:27)  
      at org.mule.execution.MessageProcessorNotificationExecutionInterceptor.execute(MessageProcessorNotificationExecutionInterceptor.java:46)  
      at org.mule.execution.MessageProcessorExecutionTemplate.execute(MessageProcessorExecutionTemplate.java:43)  
      at org.mule.processor.AbstractInterceptingMessageProcessorBase.processNext(AbstractInterceptingMessageProcessorBase.java:105)  
      at org.mule.construct.AbstractPipeline$1.process(AbstractPipeline.java:102)  
      at org.mule.execution.ExceptionToMessagingExceptionExecutionInterceptor.execute(ExceptionToMessagingExceptionExecutionInterceptor.java:27)  
      at org.mule.execution.MessageProcessorNotificationExecutionInterceptor.execute(MessageProcessorNotificationExecutionInterceptor.java:46)  
      at org.mule.execution.MessageProcessorExecutionTemplate.execute(MessageProcessorExecutionTemplate.java:43)  
      at org.mule.processor.chain.DefaultMessageProcessorChain.doProcess(DefaultMessageProcessorChain.java:93)  
      at org.mule.processor.chain.AbstractMessageProcessorChain.process(AbstractMessageProcessorChain.java:66)  
      at org.mule.processor.chain.InterceptingChainLifecycleWrapper.doProcess(InterceptingChainLifecycleWrapper.java:57)  
      at org.mule.processor.chain.AbstractMessageProcessorChain.process(AbstractMessageProcessorChain.java:66)  
      at org.mule.processor.chain.InterceptingChainLifecycleWrapper.access$001(InterceptingChainLifecycleWrapper.java:29)  
      at org.mule.processor.chain.InterceptingChainLifecycleWrapper$1.process(InterceptingChainLifecycleWrapper.java:90)  
      at org.mule.execution.ExceptionToMessagingExceptionExecutionInterceptor.execute(ExceptionToMessagingExceptionExecutionInterceptor.java:27)  
      at org.mule.execution.MessageProcessorNotificationExecutionInterceptor.execute(MessageProcessorNotificationExecutionInterceptor.java:46)  
      at org.mule.execution.MessageProcessorExecutionTemplate.execute(MessageProcessorExecutionTemplate.java:43)  
      at org.mule.processor.chain.InterceptingChainLifecycleWrapper.process(InterceptingChainLifecycleWrapper.java:85)  
      at org.mule.construct.AbstractPipeline$3.process(AbstractPipeline.java:194)  
      at org.mule.execution.ExceptionToMessagingExceptionExecutionInterceptor.execute(ExceptionToMessagingExceptionExecutionInterceptor.java:27)  
      at org.mule.execution.MessageProcessorNotificationExecutionInterceptor.execute(MessageProcessorNotificationExecutionInterceptor.java:46)  
      at org.mule.execution.MessageProcessorExecutionTemplate.execute(MessageProcessorExecutionTemplate.java:43)  
      at org.mule.processor.chain.SimpleMessageProcessorChain.doProcess(SimpleMessageProcessorChain.java:47)  
      at org.mule.processor.chain.AbstractMessageProcessorChain.process(AbstractMessageProcessorChain.java:66)  
      at org.mule.processor.chain.InterceptingChainLifecycleWrapper.doProcess(InterceptingChainLifecycleWrapper.java:57)  
      at org.mule.processor.chain.AbstractMessageProcessorChain.process(AbstractMessageProcessorChain.java:66)  
      at org.mule.processor.chain.InterceptingChainLifecycleWrapper.access$001(InterceptingChainLifecycleWrapper.java:29)  
      at org.mule.processor.chain.InterceptingChainLifecycleWrapper$1.process(InterceptingChainLifecycleWrapper.java:90)  
      at org.mule.execution.ExceptionToMessagingExceptionExecutionInterceptor.execute(ExceptionToMessagingExceptionExecutionInterceptor.java:27)  
      at org.mule.execution.MessageProcessorNotificationExecutionInterceptor.execute(MessageProcessorNotificationExecutionInterceptor.java:46)  
      at org.mule.execution.MessageProcessorExecutionTemplate.execute(MessageProcessorExecutionTemplate.java:43)  
      at org.mule.processor.chain.InterceptingChainLifecycleWrapper.process(InterceptingChainLifecycleWrapper.java:85)  
      at org.mule.execution.ExceptionToMessagingExceptionExecutionInterceptor.execute(ExceptionToMessagingExceptionExecutionInterceptor.java:27)  
      at org.mule.execution.MessageProcessorNotificationExecutionInterceptor.execute(MessageProcessorNotificationExecutionInterceptor.java:46)  
      at org.mule.execution.MessageProcessorExecutionTemplate.execute(MessageProcessorExecutionTemplate.java:43)  
      at org.mule.processor.chain.SimpleMessageProcessorChain.doProcess(SimpleMessageProcessorChain.java:47)  
      at org.mule.processor.chain.AbstractMessageProcessorChain.process(AbstractMessageProcessorChain.java:66)  
      at org.mule.processor.chain.InterceptingChainLifecycleWrapper.doProcess(InterceptingChainLifecycleWrapper.java:57)  
      at org.mule.processor.chain.AbstractMessageProcessorChain.process(AbstractMessageProcessorChain.java:66)  
      at org.mule.processor.chain.InterceptingChainLifecycleWrapper.access$001(InterceptingChainLifecycleWrapper.java:29)  
      at org.mule.processor.chain.InterceptingChainLifecycleWrapper$1.process(InterceptingChainLifecycleWrapper.java:90)  
      at org.mule.execution.ExceptionToMessagingExceptionExecutionInterceptor.execute(ExceptionToMessagingExceptionExecutionInterceptor.java:27)  
      at org.mule.execution.MessageProcessorNotificationExecutionInterceptor.execute(MessageProcessorNotificationExecutionInterceptor.java:46)  
      at org.mule.execution.MessageProcessorExecutionTemplate.execute(MessageProcessorExecutionTemplate.java:43)  
      at org.mule.processor.chain.InterceptingChainLifecycleWrapper.process(InterceptingChainLifecycleWrapper.java:85)  
      at org.mule.transport.AbstractMessageReceiver.routeMessage(AbstractMessageReceiver.java:220)  
      at org.mule.transport.AbstractMessageReceiver.routeMessage(AbstractMessageReceiver.java:202)  
      at org.mule.transport.AbstractMessageReceiver.routeMessage(AbstractMessageReceiver.java:194)  
      at org.mule.transport.AbstractMessageReceiver.routeMessage(AbstractMessageReceiver.java:181)  
      at org.mule.transport.http.HttpMessageReceiver$HttpWorker$1.process(HttpMessageReceiver.java:311)  
      at org.mule.transport.http.HttpMessageReceiver$HttpWorker$1.process(HttpMessageReceiver.java:306)  
      at org.mule.execution.ExecuteCallbackInterceptor.execute(ExecuteCallbackInterceptor.java:20)  
      at org.mule.execution.HandleExceptionInterceptor.execute(HandleExceptionInterceptor.java:34)  
      at org.mule.execution.HandleExceptionInterceptor.execute(HandleExceptionInterceptor.java:18)  
      at org.mule.execution.BeginAndResolveTransactionInterceptor.execute(BeginAndResolveTransactionInterceptor.java:58)  
      at org.mule.execution.ResolvePreviousTransactionInterceptor.execute(ResolvePreviousTransactionInterceptor.java:48)  
      at org.mule.execution.SuspendXaTransactionInterceptor.execute(SuspendXaTransactionInterceptor.java:54)  
      at org.mule.execution.ValidateTransactionalStateInterceptor.execute(ValidateTransactionalStateInterceptor.java:44)  
      at org.mule.execution.IsolateCurrentTransactionInterceptor.execute(IsolateCurrentTransactionInterceptor.java:44)  
      at org.mule.execution.ExternalTransactionInterceptor.execute(ExternalTransactionInterceptor.java:52)  
      at org.mule.execution.RethrowExceptionInterceptor.execute(RethrowExceptionInterceptor.java:32)  
      at org.mule.execution.RethrowExceptionInterceptor.execute(RethrowExceptionInterceptor.java:17)  
      at org.mule.execution.TransactionalErrorHandlingExecutionTemplate.execute(TransactionalErrorHandlingExecutionTemplate.java:113)  
      at org.mule.execution.TransactionalErrorHandlingExecutionTemplate.execute(TransactionalErrorHandlingExecutionTemplate.java:34)  
      at org.mule.transport.http.HttpMessageReceiver$HttpWorker.doRequest(HttpMessageReceiver.java:305)  
      at org.mule.transport.http.HttpMessageReceiver$HttpWorker.processRequest(HttpMessageReceiver.java:251)  
      at org.mule.transport.http.HttpMessageReceiver$HttpWorker.run(HttpMessageReceiver.java:163)  
      at org.mule.work.WorkerContext.run(WorkerContext.java:311)  
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)  
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)  
      at java.lang.Thread.run(Thread.java:680)  
 ERROR 2013-02-08 16:18:30,001 [[ldap_test].connector.http.mule.default.receiver.02] org.mule.retry.notifiers.ConnectNotifier: Failed to connect/reconnect: Work Descriptor. Root Exception was: javax.naming.NameNotFoundException: [LDAP: error code 32 - No Such Object]; remaining name 'cn=ldapUser,ou=people,dc=example,dc=com'; resolved object com.sun.jndi.ldap.LdapCtx@c22b29a. Type: class javax.naming.NameNotFoundException  
 ERROR 2013-02-08 16:18:30,003 [[ldap_test].connector.http.mule.default.receiver.02] org.mule.exception.CatchMessagingExceptionStrategy:   
 ********************************************************************************  
 Message        : Failed to invoke lookup. Message payload is of type: String  
 Code         : MULE_ERROR--2  
 --------------------------------------------------------------------------------  
 Exception stack is:  
 1. javax.naming.NameNotFoundException: [LDAP: error code 32 - No Such Object]; remaining name 'cn=ldapUser,ou=people,dc=example,dc=com'; resolved object com.sun.jndi.ldap.LdapCtx@c22b29a (javax.naming.NameNotFoundException)  
  com.sun.jndi.ldap.LdapCtx:3092 (http://java.sun.com/j2ee/sdk_1.3/techdocs/api/javax/naming/NameNotFoundException.html)  
 2. [LDAP: error code 32 - No Such Object] (org.mule.module.ldap.ldap.api.NameNotFoundException)  
  sun.reflect.NativeConstructorAccessorImpl:-2 (http://www.mulesoft.org/docs/site/current3/apidocs/org/mule/module/ldap/ldap/api/NameNotFoundException.html)  
 3. Failed to invoke lookup. Message payload is of type: String (org.mule.api.MessagingException)  
  org.mule.module.ldap.processors.LookupMessageProcessor:141 (http://www.mulesoft.org/docs/site/current3/apidocs/org/mule/api/MessagingException.html)  
 --------------------------------------------------------------------------------  
 Root Exception stack trace:  
 javax.naming.NameNotFoundException: [LDAP: error code 32 - No Such Object]; remaining name 'cn=ldapUser,ou=people,dc=example,dc=com'  
      at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3092)  
      at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3013)  
      at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2820)  
   + 3 more (set debug level logging or '-Dmule.verbose.exceptions=true' for everything)  
 ********************************************************************************  
 INFO 2013-02-08 16:18:30,681 [[ldap_test].connector.http.mule.default.receiver.02] org.mule.module.ldap.LDAPConnector: Added entry cn=ldapUser,ou=people,dc=example,dc=com  
 INFO 2013-02-08 16:18:30,690 [[ldap_test].connector.http.mule.default.receiver.02] org.mule.api.processor.LoggerMessageProcessor: dn: cn=ldapUser,ou=people,dc=example,dc=com  
 sn: foobar  
 cn: foobar  
 objectClass: person  
Notice the NameNotFoundException block. This exception is handled by CatchExceptionStrategy and then the addition of LDAP is triggered. Also we can verify this through a LDAP browser.
This is just an extremely simple example and hopefully one gets the ideas of how easy and powerful Mule is to integrate with LDAP with minimum efforts.

Friday, December 28, 2012

Use Quartz Manager to monitor Quartz jobs inside Mule Studio

Now that we can successfully schedule jobs inside Mule Studio using Quartz, the next logical question is how we can monitor these jobs to know whether they executed or not, succeeded or failed? Mule does not provide monitoring at such a fine grained level. Luckily, Terracotta offers Mule Manager, as part of its Enterprise offering.

One of the key challenges we need to solve is to provide in-depth monitoring capability for our next generation platform. Currently we use cron to schedule some of recurring jobs. While cron is easily to setup, it provides no visibility into if and when the job was executed, the status of the job execution (success, fail, or still ongoing).

In this blog, we will show how to configure Mule Studio and Quartz connector so jobs can be monitored by Quartz manager.

From Terracotta's web site,

"Quartz Manager provides real-time monitoring and management for Quartz Scheduler. Use its rich graphical user interface to:
  • Gain immediate visibility into job schedules, status and activity
  • Readily add or modify scheduling information
  • Manage multiple instances of Quartz Scheduler through a single interface
  • Simplify ongoing management of job scheduling and execution
Quartz Manager is an enterprise-grade addition to Quartz Scheduler that comes with a commercial license and support."
We really like the facts that Quartz monitoring is JMX based and Quartz manager works with existing Quartz schedule w/o any configuration changes, other than enabling JMX support.

Here are the necessary steps to integrate Quartz end point with Quartz Manager.

Download and Install Quartz Manager

Follow the link to download and install Quartz Manager.

Enable JMX for Quartz Connector inside Mule Studio

Next we must enable JMX for both Quartz Connector and when we run the Mule flow.

Enable JMX for Quartz Connector

Enabling JMX for Quartz connector is accomplished through setting the following "quartz:factory-property" entries,
 <quartz:factory-property key="org.quartz.scheduler.jmx.export"  
                value="true" />  
 <quartz:factory-property key="org.quartz.scheduler.jmx.objectName"  
                value="quartz:type=QuartzScheduler,name=JmxScheduler,instanceId=NONE_CLUSTERED" />  

Replace org.quartz.scheduler.jmx.objectName's name and instanceId with one's own appropriate values.

Enable JMX Remoting when Running Mule Flow

Now we need to enable JMX remoting when running Mule Flow. Go to eclipse->Run->Run Configuration. Find the run configuration for the existing flow under "Mule Application". Add the following entries to VM arguments,
 -XX:PermSize=128M -XX:MaxPermSize=256M -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.ssl=false  

Upgrade Quartz JAR File in Mule Studio

Next we need to upgrade the quartz-all.jar file shipped with Mule Enterprise Studio. Currently Mule ships quartz-all-1.6.6.jar, which is very old and won't work with Quartz manager. We can't use the most recently release of Quartz as the most recent version breaks the backward compatibility and Mule won't support it. Download quartz-all-1.8.6.jar from Terrocatta's web site.

Go to MuleStudio installation directory and replace quartz-all-1.6.6.jar with quartz-all-1.8.6.jar.

We also need to fix the built-in Mule runtime libraries reference to quartz-all-1.6.6.jar with quartz-all-1.8.6.jar. Mule runtime dependency is defined in a MANIFEST.MF file under your Mule studio installation directory. Locate this file, replace quartz-all-1.6.6.jar with quartz-all-1.8.6.jar.

Start Mule Flow and Quartz Manager

Now we can start Mule Flow and Quartz Manager. 

When Quartz Manager first starts up, it asks for JMX host and port number,

Enter the correct host name and port number.

After successfully connecting to the remote Quartz scheduler, Quartz manager display the following job details page,

Click on "Triggers" to show trigger details.
















Click on "Job Monitor" to see the detailed status for each executed job.



Thursday, December 27, 2012

Schedule Jobs using Mule Studio and Quartz, part II

In the previous blog, we discuss how to use inbound Quartz end point to trigger jobs. In this blog, we discuss how to use outbound Quartz end point to schedule jobs when some external events happen.

Below is the sample Mule flow. When someone accesses the HTTP end point, a message is put on a VM end point, which triggers Quartz Scheduler to run a job that put a message on a different VM end point, which gets picked up by Java transformer that will process it eventually.


 Pay attention to the key difference between this flow and the inbound end point flow. In the inbound end point flow, there is no message following into the inbound end point. For the outbound event, a message (VM or JMS) is necessary to trigger the job.

Double click on the outbound end point and the following window shows up,


Fill in the necessary scheduling information and click on plus sign to add a new job.



Make sure we select "quartz:schedule-dispatch-job".


Make sure "Group Name:", "Job Group Name:", and "Address:" are not filled in at all. If they are filled in, one will get some weird error that can't be made any sense of. I wish Mule Studio can be more helpful but it doesn't. I finally resolved this issue after spending hours searching and googling.

That's it. Now execute the flow and access the HTTP end point. You will see our Java transformer gets executed twice after initial delay of 15 seconds.

Schedule Jobs using Mule Studio and Quartz, Part I

There are a lot of confusions in how to use Quartz to schedule jobs inside Mule Studio. The documentation is slim to none.
In this series of blogs, we attempt to explain and show how to use Quartz to schedule both inbound and outbound jobs, and how to enable Mule Quartz connector so we can use Quartz manager to monitor jobs.

For Mule's reference, "The Quartz transport provides support for scheduling events and for triggering new events. An inbound quartz endpoint can be used to trigger inbound events that can be repeated, such as every second. Outbound quartz endpoints can be used to schedule an existing event to fire at a later date. Users can create schedules using cron expressions, and events can be persisted in a database."

An inbound quartz endpoint is used when we need to trigger inbound events periodically, like generating a predefined message, or loading a file, and then pass the generated message down the flow, either directly or through a VM or JMS.

An outbound quartz endpoint is used when we need to trigger an event based on incoming events. For example, in our world, we want to send all users a 25% off coupon everyday at 9am, for all the users who click on the welcome link embedded in our email campaign for the previous day.

The key difference between the inbound and outbound endpoint is that inbound message is scheduled entirely by Quartz, while the outbound endpoint is scheduled by Quartz, but only after an incoming event has been triggered. We will use some examples to illustrate the key differences.

Quartz Connector

But before we dive into the nitty gritty details, we need to create a Mule Quartz connector first. A Mule Quartz connector is how a Quartz schedule should be created by Mule.

Here is a sample scheduler using RAMJobStore. Enter the following block of xml in Mule flow's configuration XML section.
   <quartz:connector name="quartzConnector_vm" validateConnections="true" doc:name="Quartz">  
     <quartz:factory-property key="org.quartz.scheduler.instanceName" value="MuleScheduler1"/>  
     <quartz:factory-property key="org.quartz.threadPool.class" value="org.quartz.simpl.SimpleThreadPool"/>  
     <quartz:factory-property key="org.quartz.threadPool.threadCount" value="3"/>  
     <quartz:factory-property key="org.quartz.scheduler.rmi.proxy" value="false"/>  
     <quartz:factory-property key="org.quartz.scheduler.rmi.export" value="false"/>  
     <quartz:factory-property key="org.quartz.jobStore.class" value="org.quartz.simpl.RAMJobStore"/>  
   </quartz:connector>  

Mule uses "quartz:factory-property" to specify all Quartz related properties.

Here is how to configure a clustered scheduler using a JDBCJobStore using MySQL. Enter the following block of xml in Mule flow's configuration XML section.

      <quartz:connector name="quartzConnector_vm"  
           validateConnections="true" doc:name="Quartz">  
           <quartz:factory-property key="org.quartz.scheduler.instanceName"  
                value="JmxScheduler" />  
           <quartz:factory-property key="org.quartz.scheduler.instanceId"  
                value="_CLUSTERED" />  
           <quartz:factory-property key="org.quartz.jobStore.isClustered" value="true" />  
           <quartz:factory-property key="org.quartz.scheduler.jobFactory.class"  
                value="org.quartz.simpl.SimpleJobFactory" />  
           <quartz:factory-property key="org.quartz.threadPool.class"  
                value="org.quartz.simpl.SimpleThreadPool" />  
           <quartz:factory-property key="org.quartz.threadPool.threadCount"  
                value="3" />  
           <quartz:factory-property key="org.quartz.scheduler.rmi.proxy"  
                value="false" />  
           <quartz:factory-property key="org.quartz.scheduler.rmi.export"  
                value="false" />  
           <!-- JDBC JOB STORE -->  
           <quartz:factory-property key="org.quartz.jobStore.class"  
                value="org.quartz.impl.jdbcjobstore.JobStoreTX" />  
           <quartz:factory-property key="org.quartz.jobStore.driverDelegateClass"  
                value="org.quartz.impl.jdbcjobstore.StdJDBCDelegate" />  
           <quartz:factory-property key="org.quartz.jobStore.dataSource"  
                value="quartzDataSource" />  
           <quartz:factory-property key="org.quartz.jobStore.tablePrefix"  
                value="QRTZ_" />  
           <!-- MYSQL Data Source -->  
           <quartz:factory-property  
                key="org.quartz.dataSource.quartzDataSource.driver" value="com.mysql.jdbc.Driver" />  
           <quartz:factory-property key="org.quartz.dataSource.quartzDataSource.URL"  
                value="jdbc:mysql://localhost:3306/quartz2" />  
           <quartz:factory-property key="org.quartz.dataSource.quartzDataSource.user"  
                value="root" />  
           <quartz:factory-property  
                key="org.quartz.dataSource.quartzDataSource.password" value="root" />  
           <quartz:factory-property  
                key="org.quartz.dataSource.quartzDataSource.maxConnections" value="8" />  
           <!-- JMX Enable -->  
           <quartz:factory-property key="org.quartz.scheduler.jmx.export"  
                value="true" />  
           <quartz:factory-property key="org.quartz.scheduler.jmx.objectName"  
                value="quartz:type=QuartzScheduler,name=JmxScheduler,instanceId=NONE_CLUSTERED" />  
      </quartz:connector>  

To verify that the Quartz connector has been configured successfully, go the "Global Elements" in the Mule flow.


Click on Quartz.


Click on "Properties" tab,

Verify all Quartz properties have been properly configured.

Use Inbound Quartz End Point

An inbound quartz end point is self-triggered and will generate one or more messages, whose payload can be preconfigured or reloaded from a file.

Here is the picture of the sample Mule Flow. Every 5 seconds, Quartz end point will generate a VM message containing "HELLO!' as its body. We have a Java transform that listens to the same VM end point and processes the generated "HELLO!" message.



Double click on "Quartz_Event_Generator" and it brings up the Quartz Mule component,



Fill in the desired information so that our job will fire every five seconds. Now we need to add a new job. Click on plus sign.



Select "quartz:event-generator-job" and click on "Next".

Fill in "Group Name", "Job Group Name", and enter "HELLO!" for Text. We can also click on "..." next to File to load the file as the message payload.

Now we need to select our already defined quartz connector by clicking on the "References" tag on the Quartz component and select the appropriate pre-defined quartz connector from the drop down list for "Connector Reference:" field.


Now we can run our Mule to verify that Quartz will trigger a message very five seconds and the message will be processed by our Java transformer.

We will discuss Quartz outbound events in the next blog.


Monday, September 24, 2012

Riak Cluster, Sanity Check, Part Two

In this blog, we continued with our sanity check on Riak Cluster.

Setting up Riak Cluster

I have used HBase, Cassandra, MongoDB in the past and I was preparing for a long, laborious efforts to setup a five-node Riak cluster but I was pleasantly surprised on how easy it was to set up the cluster. I found the command line tools are extremely helpful at diagnosing the cluster status, etc.

Sanity Check

Our sanity check consists of loading 20 million objects in the cluster and executes a series of Get, link walking, and free text search over 10 concurrent threads, while physical nodes are brought down and brought back up in the middle of performance testing.

Use HA Proxy for load balancing and failover

We couldn't get Java client's cluster client to work for us so we switch to use HA proxy for failover and load balancing purpose and it works out really well for us. 

When we brought down a Riak node during the performance testing, some of the in-flight queries failed but succeeded immediately after we retried with the same method call.

Test Result

Our sanity check performed flawless when physical nodes were brought down during the testing. Inflight queries failed but recovered right after we retried with the same query and there is little degradation when losing a physical node.

However, a big surprise came when we brought a physical node backup during the performance testing. It increased the overall performance time by 6 times. Baffled, I reached out to Brian at Basho and there is his explanation,
"The reason I ask is the http/pb API listener will start up before Riak KV has finished starting. So, while these nodes will be available for requests they will be slow and build up queues as they start up needed resources. You can use the riak-admin wait-for-service riak_kv <nodename> command from any other node to check if riak_kv is up."

My follow-on question, 
"Yes, I have my performance test running, through a HA proxy that maps to 5 nodes. I understand that the initial response from the starting up node will be slow, but it is impacting the entire performance, which consists of 10 testing threads. I would rather have the node not accepting request until it is ready than accepting requests but queuing them up to drag down the cluster performance."

And Brian's response,
"I completely agree, there is currently work being done on separating the startup into two separate processes. One which will reply with 503 to all requests and then dies when KV is started and ready to take requests. Currently, even if a request is not sent to a node directly, in the case of a PUT, the data will be forwarded to the node even though KV is not ready to accept it. Depending on your W quorum this could result in a long response time for requests as the coordinating node waits for a response from the starting up node.  Currently there is no method to make a node 'invisible' to other nodes in the cluster without bringing it down or changing the cookie in vm.args."

Summary

We are pleased with our overall sanity check results but there are definitely kinks need to be worked out and I am glad Riak team is on top of things.

Riak Sanity Check, Take One

Before we can officially recommend Riak to the management team, we executed a series of sanity checks to make sure that we will make a sound recommendation and won't come back and bite us in the rear.

Testing Scenario

Our sanity testing environment consists of 20 million objects, running on a single physics node. After 20 million objects are loaded into the Riak 1.1, we executed a series of get, link walking, and free text search with 10 concurrent threads.

We are not interested in the actual performance number, but are looking for obvious bottlenecks and abnormal behaviors.

Test Result

We are surprised at the poor results exhibit by our sanity checks, Riak consuming 100% of CPU and 75% of memory and queries didn't return in any reasonable fashion.

Needless to say, we are somewhat concerned. This is where Riak's excellent support jumps into the play. Riak's Develop Advocate Brian worked with us and came up with an excellent diagnosis;

"LevelDB holds data in a "young level" before compacting this data into numbered levels (Level0, Level1, Level2….) who's maximum space grow as their number iterates. In pre Riak 1.2 levelDB, the compaction operation from the young level to sorted string table's was a blocking operation. Compaction and read operations used the same scheduler thread to do their work so if a compaction operation was occurring that partition would be effectively locked. Riak 1.2 moves compaction off onto it's own scheduler thread so the vnode_worker (who is responsible for GET/PUT operations) will not be waiting for compaction to complete before read requested can be serviced by the levelDB backend (write requests are dropped in the write buffer, independent of compaction).

In your scenario, the bulk load operation caused a massive amount of compaction. Most likely, because of the large amount of objects you loaded, there was compaction occurring on all 64 partitions for a while after your write operation completed. The input data sat in the write buffer and young level and eventually compaction moved them to their appropriate levels but during this time read operations will timeout.  "

So our bulk loading led to a contention on a single thread, which is responsible for compacting and querying at the same time.

We can't upgrade to 1.2 just yet because a bug in search so we will have to wait for the next release. 

Summary

Overall we are not pleased with our sanity checks but am satisfied with the explanation.