Thursday, July 12, 2012

Riak Java Client Distilled

In this blog, we will show how to use Riak Java client to,
  • Create/update objects
  • Enable and search by secondary index
  • Add links and walk links
  • Enable and search by free text through MapReduce

Riak Configuration

There are a few configuration changes that we will need to make to app.config to enable secondary index and listening for all ports,
  1. Change back end to use ELevelDB, the only storage engine that supports secondary index
  2. Change localhost or 127.0.0.1 to 0.0.0.0 for all IP addresses so Riak will listen on all ports
  3. Enable Riak for search by modifying app.config file
  4. {riak_search, [
                    %% To enable Search functionality set this 'true'.
                    {enabled, true}
                   ]}
    

Riak Java Client

All Riak server access is done through a Riak client.

Which Riak Client to Use
Riak Java library offers two types of Riak clients, which is very confusion. We found that most tasks can be accomplished using the pbc (low level protocol buffer client) client, except for the following exceptions that one must use the HPTT client,
  • Enable free text search for buckets
How to Obtain a Riak Client

 RiakClient riakClient = RiakFactory.pbcClient(host, port);  

Shutdown Riak Client in the End

One must shutdown all active risk client before shutting down the application/Tomcat server itself.
 riakClient.shutdown();  

Create, Update, and Lookup Object

Riak Client API offers a few annotations to indicate a particular field and we highly recommend use them rather than playing around the metadata ourselves,

  • The Riak Key field (through @RiakKey annotation)
  • A Riak secondary index field (through @RiakIndex annotation)
  • A Riak links collection field (through @RiakLinks annotation)
When we persist an annotated object through Riak client, Riak client will process the key, secondary indices, and links first before handling the object to Jackson for serializing into JSON string and storing the JSON string in Riak. If we choose to manage the object serialization/deserialization through Jackson ourselves, we must also handle the metadata changes like a new secondary index is added/removed or new links are added or removed. If not handled carefully, we could easily lose the existing secondary indices/links when an object is updated.

Here is an example highlighting the usage of above annotations,

 public class JsonObject  
 {  
   @JsonProperty  
   String bucket;  
   
   @RiakKey  
   String key;  
   
   @JsonProperty  
   String name;  
   
     
   @RiakLinks  
   @JsonIgnore  
   Collection<RiakLink> riakLinks = new ArrayList<RiakLink>();  
   
     
   @RiakIndex(name = "uri")  
   @JsonProperty  
   String uriIndex;  
   
  }  

To save/update an object,

 this.riakClient.createBucket(bucket).execute().store(object).execute();  
   

To lookup an object by key,
   
 @Override  
   public <T> T get(final String bucket, final String key, final Class<T> kclass)  
   {  
     try  
     {  
       return this.riakClient.fetchBucket(bucket).execute().fetch(key, kclass).execute();  
     }  
     catch (final RiakRetryFailedException e)  
     {  
       throw new RuntimeException(e);  
     }  
   }  

Secondary Index Creation and Retrieval
When an object's field has @RiakIndex annotated, secondary index is automatically created/updated when the object is stored or updated.

To look up an object based on secondary index,

 public List<String> fetchIndex(final String bucket, final String indexName, final String indexValue)  
   {  
     try  
     {  
   
       return this.riakClient.fetchBucket(bucket).execute().fetchIndex(BinIndex.named(indexName))  
           .withValue(indexValue).execute();  
     }  
     catch (final RiakException e)  
     {  
       throw new RuntimeException(e);  
     }  
   
     // Collection<String> collection = results.getResult(String.class);  
   }  

Riak Search
Riak search must be enabled at the bucket level before Riak will index properties on all objects in the bucket. To

 bin/search-cmd install my_bucket_name  
To execute a Riak search on a given bucket,
 @Override  
   public Collection<JsonObject> search(final String bucket, final String criteria)  
   {  
     try  
     {  
       final MapReduceResult mapReduceResult = this.riakClient.  
           mapReduce(bucket, criteria)  
           .addMapPhase(new NamedJSFunction("Riak.mapValuesJson")).execute();  
       return mapReduceResult.getResult(JsonObject.class);  
     }  
     catch (final Exception e)  
     {  
       throw new RuntimeException(e);  
     }  
   }  
Where parameter bucket is the bucket name and the criteria is the search criteria like "type=Folder" or "(type=Folder AND name=Hello)".

Riak Link Walking
Riak link walking apparently only with HTTP client, not the pbc client for some reason.

Here is a sample code to link walk a specific number of steps from the current object, identified by key.
  @Override  
   public List<List<String>> walk(  
                   final String bucket, // bucket name  
                   final String key,  // originating object key  
                   final String linkName,  // link name  
                   final int steps   // number of steps to walk. Riak will stop if it can't walk further  
                   )  
   {  
     final List<List<String>> walkResults = new ArrayList<List<String>>();  
   
     try  
     {  
       final LinkWalk linkWalk = this.riakHttpClient.walk(this.riakHttpClient.createBucket(bucket).execute().fetch(key)  
           .execute());  
   
       for (int i = 0; i < steps; i++)  
       {  
         linkWalk.addStep(bucket, linkName, true);  
       }  
       final WalkResult walkResult = linkWalk.execute();  
   
       final Iterator<Collection<IRiakObject>> it = walkResult.iterator();  
       while (it.hasNext())  
       {  
         final List<String> list = new ArrayList<String>();  
         final Collection<IRiakObject> collections = it.next();  
   
         for (final IRiakObject riakObject : collections)  
         {  
           list.add(riakObject.getKey());  
         }  
         if (list.size() > 0)  
         {  
           walkResults.add(list);  
         }  
       }  
     }  
     catch (final Exception e)  
     {  
       throw new RuntimeException(e);  
     }  
   
     return walkResults;  
   }  

Tuesday, July 10, 2012

Why We Chose Riak


In this blog, we will discuss why we chose Riak as one of the persistence storage engines for our next generation platform. In the next blog, we will show how to use Riak Java client library to create and update objects, creating new secondary index, links, and free text search.

Object Model
Just recap our dual object model, one for external interfacing RESTful Web Services and the internal persistence object model below.

We store only JSONWrapper objects in Riak, along with the appropriate relationships and links. We also need to search for objects based on their name, type, etc.

Why We Chose Riak

I have been playing around with Riak for the past month and came to the conclusion that Riak is a good  option for our next generation platform, for the following reasons;

Ever-Evolving Object Model
The highly adaptive nature of our object model is not a good fit for the traditional ORM on top of RDBMS, as the object model is highly customizable from customer and customer and may evolve from version to version. The transitional ORM would require RDBMS schema to continuously keep up with our ever-evolving object model, requiring enormous efforts on Engineering, Testing, and Operations.

The platform really does not care about the customized and highly evolved properties of object types. In other words, the platform only needs to know a pre-defined set of object properties for persistence and relationship resolution purpose and does not need to know all the other properties.

Riak, on the other hand, gives us the flexibility for storing opaque objects and we decide to store objects as JSON rather than Java objects or XML because JSON serialization is much more flexible and compact and needs far less storage than Java or XML.

High Availability and Multi-Data Center Support
Riak is built as a distributed data storage, with tunable read and write replica strategy.
Riak Enterprise offers multi-data center replication.

Free Text Search 
Riak comes with build-in free text search support, built on top of Lucene.

Adjacency Link Walking
Our object model relies on adjacent link between objects and it is critical to be able to follow the object graph through these adjacency links. Riak offers MapReduced based link walking functionality so we can easily retrieve all objects that are linked to a particular object through any levels of links.

Secondary Index Support
Like RDBMS, Riak offers secondary index support in addition to primary key lookup.

Multi-Tenant Support
Our platform must support multi-tenancy for security, partition and performance reasons, which is not trivial to accomplish in a RDBMS environment.

Riak, on the other hand, partitions data naturally in buckets and buckets are distributed across different nodes. Tenants can be mapped to buckets and data level security can be accomplished through securing access to buckets. If we store  a tenant related data in the same bucket, a user can only access the data if he has access to the bucket and he can't access any objects not belong to accessible buckets.

Ad Hoc Query Support Through MapReduce
Riak provides us the ability to run Ad Hoc queries through the entire data set, through a series Map and Reduce phases. The only limitation is that MapReduce is executed in memory and must complete with a timeout limit. This is not a major concern given the size of data set.

Performance
Riak is based on a distributed data model, which should perform better than master-slave type of model.

Operation and Monitoring Support
Riak ships with a UI monitoring tool and a set of commends for other administrative tasks like backup/restore, etc.

Concerns about Riak
We do have concerns regarding Riak from a business perspective. Even though Riak is an open source solution, its commercial backer Basho is still relatively young and the user community is not as big as Hadoop, Cassandra, or MongoDB.

To mitigate the risk, we built a persistence abstraction layer that allows us to swamp Riak with a different NoSQL technology in the future if necessary.




Monday, July 9, 2012

Building an Adaptive Object Model

In this series of blogs, we will discuss how we build our next generation platform using Jeresy, Jackson, JSON, and Riak. But first, we will show how to build an adaptive object model, supporting multiple versions of object types simultaneously.

Object Model Requirement

For our platform, we are storing various types of configuration objects, with parent/child relationship linking objects together. Each object type has a set of fixed pre-defined properties and a set of custom properties, which can vary from customer to customers.

Support Versioned Objects

As our platform evolves, our object model will need to adapt and evolve, which means we need to support and store different versions of same object type. Properties can be added or removed between different versions.

Building an Adaptive Object Model

After some exploring, we decided to go with two set of object models, an explicit object model for Web Services and a generic persistence object model.

Here is a picture of the two object models,

The generic object graph approach consists of two layers of abstraction, a wrapper object and an inner JSON object. The wrapper layer contains the following static information that does not change from one version to another, like,

  • Object name
  • Object key
  • Object type
  • Object uri
  • Object version
  • A relationship map
  • Parent id
and a map representation of the inner JSON object.

The inner JSON object is the JSON object created by the user or will be returned to the user. The inner JSON object can be different from version to version and from type to type. Since the platform does not need to know what is actually stored in the inner JSON object, other than a set of standard fields, like,
  • Object name
  • Object type
  • Object version
The platform just stores the wrapper object, with inner object as an opaque map. Since the wrapper object structure does not change based on version or type and the inner object is stored as a generic Map type, the platform does not need to change every time we add a new object type or change an existing object type.

Object Validation

However, we still need to validate user supplied JSON object to make sure that it matches the correct version of the object type. We accomplish this by creating a set of validating classes and register them by type and version. When we receiving an object creation request, we will first deserialize the input JSON to a raw Map type and pull out the type and version from the map object. Then we look up the validation class based on type and version and then deserialize the input JSON object again based on the validation class.

When a new type is introduced or new version of type is introduced, we just update the validation map with the new configuration and deploy the new classes and nothing else needs to be changed.

Sequence Diagram for Web Services

Create a New Object through Web Services.

Get an Object through Web Services.