Category Archives: Synchronization

Sync: Operational Transformation vs. Conflict-Free Replication Data Types (CRDTs)

I need a solution for data sync/replication of offline data that doesn’t require my team to read whitepapers and understand theoretical mathematics.

There is an argument going on right now as to whether Operational Transformations (OT) or Conflict-Free Replication Data Types  (CRDTs) are the way to go here.  Both technologies are intended to solve the thorny problem of handling (or removing the potential) of conflicts when multiple parties are working on the same data without direct awareness of the efforts of another party or parties (perhaps because of temporal or location differences).

Maturity

I really like the idea of CRDTs, but there isn’t really a practical (or at least popular) implementation of full document CRDTs (think JSON) that I know of right now.  There also seems to be the (old) problem where the CRDTs are replicating correctly, but we are asking them to do the wrong thing… To be a little more clear, we are having difficulty getting the intent of the users expressed in the data structure that prevents conflicts.  We can get eventual consistency between the two bodies of data, but what would the two (or more) parties have created if they did it together side-by-side?

This is a problem that has been explored a little more in the world of Operational Transformation so the solutions (that I am aware of) are a little more mature.

Sync via peer to peer?

The primary downside of OT (that I can see) is that there really needs to be a single source of truth (think server) with Operational Transformations whereas CRDTs allow full mesh or peer-to-peer (P2P) sync.

Because P2P communication is almost as difficult right now as sync itself, it may just be practical to work with OT.

Some libraries to look at

I have been playing a bit with sharedb.  This seems to be the best OT work going on in JavaScript right now.  That said, there isn’t a huge community around the library and the owner’s (though amazing and brilliant people with other real jobs to keep down) do not appear to be super responsive to pull requests and issues.

If you are looking at doing the P2P thing, it seems like Scuttlebutt is a protocol/replication technology that is getting a bit of traction.  I believe it is inherently duplex though… so YMMV.  Here is a JavaScript implementation that might interest you.

 

DerbyJs or Racer on Windows

You could argue this just isn’t meant to be … and you might be right.  Unfortunately for me, flailing around in my Ubuntu VM is just a slow way for me to develop.  I know that makes me a terrible person, but sometimes the tools you use every day belong to the dark side.

In any case, I set out to get DerbyJs and Racer working on my windows machine.

DerbyJs and Racer are created by pretty much the same group of people.  They are JavaScript frameworks that run on Node.Js and use Operational Transformation to synchronize data in real-time across clients.

The tweaks

The first smack in the face is Redis.  You’ll need to install the windows version of that here, but that’s not remotely the hard part.  The difficulty is when you try to do your npm install for the example repositories of DerbyJs or Racer  .  Then all hell breaks loose, and you run into the “we don’t support windows” contingent with npm install redis.

Turns out the lovely people at hiredis do support windows though.  So here’s the trick(s).  Install hiredis globally FIRST (npm install -g hiredis).  Then go into your global npm cache and copy everything from %appdata%\npm\node_modules\hiredis\build\Release one directory down to %appdata%\npm\node_modules\hiredis\build (because REASONS).

Now npm install type things will magically start working — if you’ve put %appdata%\npm\node_modules into your system environment variables as NODE_PATH.  NPM doesn’t do that on installation because… it’s fun to google?

System Node Environment Variable
How to set your npm cache path

 

Now after all this effort  (at least at the time of this writing)  nothing will work.  You’ve got a couple more tweaks to make.

For Racer

You need to go in and remove all the “release kind of like this one” stuff from your package.json. (See JSON below with red things to remove.)  This is because the current release of Racer will not work with the example code.  No idea why because the problem seems to be somewhere in the Operational Transformation code — which is fiddly stuff.

"dependencies": {
"coffeeify": "~0.6.0",
"express": "~3.4.8",
"handlebars": "^1.3.0",
"livedb-mongo": "~0.4.0",
"racer": "^0.6.0-alpha32",
"racer-browserchannel": "^0.3.0",
"racer-bundle": "~0.1.1",
"redis": "^2.4.2"
}

BTW, below is, I think, my favorite bit of code ever.  When running the “Pad” example of Racer, it is fired over and over by a dependency (called syntax-error) of browserfy:

module.exports = function (src, file) {
if (typeof src !== 'string') src = String(src);
try {
eval('throw "STOP"; (function () { ' + src + '})()');
return;
}
catch (err) {
if (err === 'STOP') return undefined;
if (err.constructor.name !== 'SyntaxError') throw err;
return errorInfo(src, file);
}
};

I’m sure there is a good reason for it, but I can’t fathom it myself.  I had to comment it out.

 Now for DerbyJs

There is some sort of issue with how it is creating the paths for your views.  To get it to work, there is a patch you’ll need to add to your package.json.

After the patch is installed, you’ll need to go into the index.js of each of the examples and add it before the require(‘derby’):

require('derby-patch');
var app = module.exports = require('derby').createApp('hello', __filename);

I think that should do it.  I was able to get things running fairly well after that.

One last thing, if you happen to be using Visual Studio, I can recommend the Node.Js Tools for Visual Studio with some confidence now.  During the beta stage they were pretty bad about crashing my IDE, but (except when running unit tests) I actually prefer them to WebStorm now .  I know… sacrilege.

 

Synchronization Using Interval Tree Clocks in JavaScript

As a follow up to my previous post, I’ve implemented the Interval Tree Clock code in JavaScript with tests.  I’ve also begun a synchronization framework to go with it.

Github ITC in Javascript

The framework would be for synchronizing documents in full mesh mode — so peer to peer.

Everything has tests, so you can easily see the direction and progress by just reviewing the tests.

I stab the Synchronization with big knife

Synchronization With Interval Tree Clocks

Sync ProblemsI’ve been working with mobile devices for a long time, and inevitably the most painful piece of the development process is getting data to be consistent across all replicas.
For years, I’ve been trying to find a consistent means of taking care of this in a way which is OS and repository agnostic for all replicas. It isn’t 100% clear to me why this isn’t a solved problem, but I have a feeling there are several contributing factors:

  1. Internecine conflict between all relevant parties.
  2. Rapidly changing means and standards for data storage and transmission.
  3. Figuring out causal relationships between data on different replicas is really, really difficult.

It seems to me that number 1 and 2 having become somewhat better lately because of ubiquitous JavaScript.  I’m not saying it’s trivial, but you can make an app that works just about everywhere now if you write it in HTML and JavaScript.

When dealing with data, browser based apps are still likely to be a problem with large data sets and long periods without connectivity, but it might be worth exploring the possibilities again.

To this end, I’ve been looking at solving the causal problem with Interval Tree Clocks (ITCs) lately.  They are interesting in the way that licking battery terminals is interesting.  They are painfully tedious, but if you can stick with it, you may eventually power a solution (or be brain damaged).

For a long time, I think the standard way to handle the problem of causal relationships has been vector clocks, but they have well documented limitations around space usage which do not apply to Interval Tree Clocks.

Also, you can make pretty diagrams with ITCs.

ITC Node Diagram

So I’ve been trying to rewrite the ITC algorithm in C#.  This may seem ironic since I just told you that JavaScript seems to be one solution to some of the industry’s synchronization problems, but the reality is, I’m much better at exploring ideas with type safe code.

I’ve gotten most of the C# working, and I’ve created tests.  My intent is to use those to safely port the C# over to JavaScript.

You can check the code out here.

If you prefer Java, Erlang or c, there is a repository from the original designers of the algorithm here.  A word of warning: if you try to use that repository to follow along with my code, it will be very difficult.  Conceptually, the code is somewhat similar to what I have written, but my implementation is almost entirely different.