Feeds:
Posts
Comments

Archive for August, 2011

Some Cassandra gotchas

Here are some points to keep in mind when working with Cassandra.

  • Have to use ordered partitioner if you want row keys in sorted order. Be aware that if the row keys are not distributed properly it would create hot spots since most of rows will be concentrated to several nodes.
  • Columns are sorted using definitions given when creating the column family but row keys are not sorted according to them.
  • If you see only hex values in Cassandra-cli for column keys and values in the shown results use ‘as’ and ‘assume’ to get the human readable values of the column keys and values. See this thread.

Read Full Post »

For the past several weeks I have been writing a synchronization library in my free time. This can be used in a distributed manner, meaning several nodes in a cluster can use this library to synchronize between each other. This implementation is based on Apache Zookeeper. Though primarily Zookeeper based I have also left room for other implementations. All you have to do is to implement couple of interfaces. Let’s look at some features of the library followed by an example usage of the library.

  • The library API closely follows java.util.concurrent API so this would be a natural transition for developers familiar with concurrent package.
  • The synchronization granularity is thread level not node level. So this library can used for in VM synchronization between threads as well. But if you only require in VM synchronization ideally you are better off with the use of java.util.concurrent due to the performance factor. But you already knew that. :). But anyway this can be handy if you got several application threads contending for the same distributed and shared resource and you need to enforce mutual exclusion semantics per each user whether it be a thread or a node.
  • Re-entrancy is implemented at thread level in reentrant synchronization primitives. For example in ReentrantLock etc. Again this is in line with the semantics of java.util.concurrent.
  • There are also several places where semantics differ from java.util.concurrent package due to the distributed nature of the library. For example in CyclicBarrier the number of parties that should trigger barrier may be different of what you pass at the intilization of the CyclicBarrier instance. If the CyclicBarrier instance is pointing to a currently existing barrier Zookeeper node it will get the existing barrier’s number of parties required for triggering the barrier. So a getParties() call subsequent to the barrier initialization will reveal the true number of parties require to trigger the barrier. This is required since otherwise the barrier will not be in a consistent state if each joining party would specify different arguments for number of parties required to trigger the barrier.

Now let’s look at a sample usage of the library with the help of ReentrantLock.


// Zookeeper configuration to connect to the Zookeeper instance
ZKConfiguration config = new ZKConfiguration("localhost:2181", 1000000, null);

// Get Zookeeper specific Lock factory.
LockFactory fac = ZKFactory.getInstance(config);

// Get the reentrant lock from the factory on specified Zookeeper
// node. Others will also get the lock on this same node in order
// to synchronize with each other.
Lock lock = fac.getReentrantLock("/test");

// From here onwards it's pretty much java.util.Concurrent API.
try {
   lock.lock();
} catch (LockException e) {
   e.printStackTrace();
}

// Do mutual exclusive work. Write to db etc..

lock.unlock();

Usage of API can be found in sources of the unit tests for now until I get around to document them in the wiki. :). Currently following synchronization primitives are available.

  • ReentrantLock
  • ReadWriteLock
  • CyclicBarrier
  • DoubleBarrier

And I am hoping to add another couple of synchronization primitives as well. The project can be found at https://github.com/chamibuddhika/dsync. Any suggestions are welcome as always.

Read Full Post »