Tuesday, July 10, 2012

Why We Chose Riak

In this blog, we will discuss why we chose Riak as one of the persistence storage engines for our next generation platform. In the next blog, we will show how to use Riak Java client library to create and update objects, creating new secondary index, links, and free text search.

Object Model
Just recap our dual object model, one for external interfacing RESTful Web Services and the internal persistence object model below.

We store only JSONWrapper objects in Riak, along with the appropriate relationships and links. We also need to search for objects based on their name, type, etc.

Why We Chose Riak

I have been playing around with Riak for the past month and came to the conclusion that Riak is a good  option for our next generation platform, for the following reasons;

Ever-Evolving Object Model
The highly adaptive nature of our object model is not a good fit for the traditional ORM on top of RDBMS, as the object model is highly customizable from customer and customer and may evolve from version to version. The transitional ORM would require RDBMS schema to continuously keep up with our ever-evolving object model, requiring enormous efforts on Engineering, Testing, and Operations.

The platform really does not care about the customized and highly evolved properties of object types. In other words, the platform only needs to know a pre-defined set of object properties for persistence and relationship resolution purpose and does not need to know all the other properties.

Riak, on the other hand, gives us the flexibility for storing opaque objects and we decide to store objects as JSON rather than Java objects or XML because JSON serialization is much more flexible and compact and needs far less storage than Java or XML.

High Availability and Multi-Data Center Support
Riak is built as a distributed data storage, with tunable read and write replica strategy.
Riak Enterprise offers multi-data center replication.

Free Text Search 
Riak comes with build-in free text search support, built on top of Lucene.

Adjacency Link Walking
Our object model relies on adjacent link between objects and it is critical to be able to follow the object graph through these adjacency links. Riak offers MapReduced based link walking functionality so we can easily retrieve all objects that are linked to a particular object through any levels of links.

Secondary Index Support
Like RDBMS, Riak offers secondary index support in addition to primary key lookup.

Multi-Tenant Support
Our platform must support multi-tenancy for security, partition and performance reasons, which is not trivial to accomplish in a RDBMS environment.

Riak, on the other hand, partitions data naturally in buckets and buckets are distributed across different nodes. Tenants can be mapped to buckets and data level security can be accomplished through securing access to buckets. If we store  a tenant related data in the same bucket, a user can only access the data if he has access to the bucket and he can't access any objects not belong to accessible buckets.

Ad Hoc Query Support Through MapReduce
Riak provides us the ability to run Ad Hoc queries through the entire data set, through a series Map and Reduce phases. The only limitation is that MapReduce is executed in memory and must complete with a timeout limit. This is not a major concern given the size of data set.

Riak is based on a distributed data model, which should perform better than master-slave type of model.

Operation and Monitoring Support
Riak ships with a UI monitoring tool and a set of commends for other administrative tasks like backup/restore, etc.

Concerns about Riak
We do have concerns regarding Riak from a business perspective. Even though Riak is an open source solution, its commercial backer Basho is still relatively young and the user community is not as big as Hadoop, Cassandra, or MongoDB.

To mitigate the risk, we built a persistence abstraction layer that allows us to swamp Riak with a different NoSQL technology in the future if necessary.

1 comment:

  1. In my humble opinion your "concerns" about Riak are ones you'd have with many open source technologies. Mongo is no different. What other NoSQL technology, particularly ones with Riak's features, would you be able to swap out down the track?

    Database abstraction has proven to not mitigate risk in almost every case it's used. If you truly abstract your database then you end up not using all the features which make it worth using. If you don't go to those lengths you may as well not have attempted it at all as your abstraction will leak into the application codebase anyway.

    Good luck with your Riak usage. I doubt you'll be disappointed.