Category Archives: Replication

Sync: Operational Transformation vs. Conflict-Free Replication Data Types (CRDTs)

I need a solution for data sync/replication of offline data that doesn’t require my team to read whitepapers and understand theoretical mathematics.

There is an argument going on right now as to whether Operational Transformations (OT) or Conflict-Free Replication Data Types  (CRDTs) are the way to go here.  Both technologies are intended to solve the thorny problem of handling (or removing the potential) of conflicts when multiple parties are working on the same data without direct awareness of the efforts of another party or parties (perhaps because of temporal or location differences).

Maturity

I really like the idea of CRDTs, but there isn’t really a practical (or at least popular) implementation of full document CRDTs (think JSON) that I know of right now.  There also seems to be the (old) problem where the CRDTs are replicating correctly, but we are asking them to do the wrong thing… To be a little more clear, we are having difficulty getting the intent of the users expressed in the data structure that prevents conflicts.  We can get eventual consistency between the two bodies of data, but what would the two (or more) parties have created if they did it together side-by-side?

This is a problem that has been explored a little more in the world of Operational Transformation so the solutions (that I am aware of) are a little more mature.

Sync via peer to peer?

The primary downside of OT (that I can see) is that there really needs to be a single source of truth (think server) with Operational Transformations whereas CRDTs allow full mesh or peer-to-peer (P2P) sync.

Because P2P communication is almost as difficult right now as sync itself, it may just be practical to work with OT.

Some libraries to look at

I have been playing a bit with sharedb.  This seems to be the best OT work going on in JavaScript right now.  That said, there isn’t a huge community around the library and the owner’s (though amazing and brilliant people with other real jobs to keep down) do not appear to be super responsive to pull requests and issues.

If you are looking at doing the P2P thing, it seems like Scuttlebutt is a protocol/replication technology that is getting a bit of traction.  I believe it is inherently duplex though… so YMMV.  Here is a JavaScript implementation that might interest you.

 

Synchronization Using Interval Tree Clocks in JavaScript

As a follow up to my previous post, I’ve implemented the Interval Tree Clock code in JavaScript with tests.  I’ve also begun a synchronization framework to go with it.

Github ITC in Javascript

The framework would be for synchronizing documents in full mesh mode — so peer to peer.

Everything has tests, so you can easily see the direction and progress by just reviewing the tests.

I stab the Synchronization with big knife

Synchronization With Interval Tree Clocks

Sync ProblemsI’ve been working with mobile devices for a long time, and inevitably the most painful piece of the development process is getting data to be consistent across all replicas.
For years, I’ve been trying to find a consistent means of taking care of this in a way which is OS and repository agnostic for all replicas. It isn’t 100% clear to me why this isn’t a solved problem, but I have a feeling there are several contributing factors:

  1. Internecine conflict between all relevant parties.
  2. Rapidly changing means and standards for data storage and transmission.
  3. Figuring out causal relationships between data on different replicas is really, really difficult.

It seems to me that number 1 and 2 having become somewhat better lately because of ubiquitous JavaScript.  I’m not saying it’s trivial, but you can make an app that works just about everywhere now if you write it in HTML and JavaScript.

When dealing with data, browser based apps are still likely to be a problem with large data sets and long periods without connectivity, but it might be worth exploring the possibilities again.

To this end, I’ve been looking at solving the causal problem with Interval Tree Clocks (ITCs) lately.  They are interesting in the way that licking battery terminals is interesting.  They are painfully tedious, but if you can stick with it, you may eventually power a solution (or be brain damaged).

For a long time, I think the standard way to handle the problem of causal relationships has been vector clocks, but they have well documented limitations around space usage which do not apply to Interval Tree Clocks.

Also, you can make pretty diagrams with ITCs.

ITC Node Diagram

So I’ve been trying to rewrite the ITC algorithm in C#.  This may seem ironic since I just told you that JavaScript seems to be one solution to some of the industry’s synchronization problems, but the reality is, I’m much better at exploring ideas with type safe code.

I’ve gotten most of the C# working, and I’ve created tests.  My intent is to use those to safely port the C# over to JavaScript.

You can check the code out here.

If you prefer Java, Erlang or c, there is a repository from the original designers of the algorithm here.  A word of warning: if you try to use that repository to follow along with my code, it will be very difficult.  Conceptually, the code is somewhat similar to what I have written, but my implementation is almost entirely different.