Azure Cosmos DB: What's it all about?

Published on 17 Jan 2018

When you're looking for the most appropriate data storage solution for your project, you'll no doubt be taking many critical criteria into consideration.

What sort of data are you persisting? Structured? Relational? How soon after saving do you expect your changes to be reflected when you next retrieve that data? How fast do you expect your operations to be?

If you're a nerd (like me) or if you did a Computer Science degree (like me) you'll probably be familiar with the CAP Theorem, which boils down to: > Consistency, Availability, Partition tolerance.

The thing is, you can only have Consistent, Highly Available data; Consistent and Partition tolerant, data; or Available, Partition-tolerant data, but not all three at the same time. So which do you choose?

Rack'em and stack'em

Back in the day, we would have had to think about deploying code to specific servers, knowing their capabilities intimately, and probably giving them "hilarious" names...I remember one place used Hammer Horror movies ..."did you deploy to Frankenstein or to Bride of Frankenstein?"

A common scenario would have you deploying Web App 1 to Web Servers 1 and 2, talking to the database on Database Server 1, which mirrors to Database Server 2, and so on.

Nowadays, developing Cloud-based applications can unshackle you from thinking about all that complex hosting infrastructure.

The ease with which you can then globally distribute your applications is huge: without buying new hardware or rewriting your deployment scripts you can tweak a config and have your application sprawling across 8 data centres across 3 continents, each driving 6 instances of your application. Demand suddenly subsides? Now you're only consuming the resources that you need, without expensive hardware idling expensively.

Now imagine your fantastic application, so scalable and widely distributed, accessing a database on one piddly database server that lives in one vulnerable data centre; doesn't feel as agile or robust, does it? If you're going the own-infrastructure route you'll want to improve reliability by having a failover disaster recovery solution - so you'll be investing in multiple database servers split across geographic regions configured with some serious replication.

This is still a common solution, but the associated price tag for that duplicated, specialised, hardware and licensed software is high.

Which is why it wasn't long after the first Cloud hosting solutions were born that data storage solutions started to pop up in the cloud too, but not without teething problems. The first version of SQL Azure for instance was...Ahem...fun. Amazon's had its share of hiccups too, but overall the big players have really tightened up their game and Cloud storage is more reliable than ever.

So hosted SQL is the answer?

Hosted SQL is great if you have relational data and a large budget, but what about if you have less relationally structured data, like a document with tags, or data which will only ever be queried by one specific index and not joined on other, septate, data? Or how about if you just have a much smaller budget?

If you're dealing with your own infrastructure and have non-relational data to deal with, then you'll probably have installed and set up something like MongoDB, Raven, or similar.

There have been cloud solutions for this for some time; AWS's DynamoDB, Azure's Table Storage, for example, each with their own limitations (and sometimes confusing APIs).

This is where Azure Cosmos DB comes in - off the bat, Microsoft's new offering supports MongoDB API, the Table Storage API, Graph API - you can even query using SQL, all at the change of a connection string.

But why would you want to? Great question! Such wisdom in someone so young. Let's dig in and find out.

Cosmos DB FTW

If you enjoy managing the nuts and bolts a database server, then Cosmos DB probably isn't for you. If, however, you just want to point your application at a data store via a connection string, and have that data globally distributed with configurable consistency levels, then you should definitely check it out.

It has limitless elastic scale around the globe, and you can tweak the availability, consistency, and latency to match your requirements for each instance. It can easily horizontally scale to enable huge throughput and cater for unpredictable bursts in traffic.

I bet it's expensive though, right? Well, right now it's around three times cheaper than AWS DynamoDB. Crazy.

Uh-oh, Microsoft Fanboy much?

Not quite! I'm all about options...To name but a few, Google have their amazing Cloud Spanner, there's EnterpriseDB (Postgres), Rackspace do a mean MySQL package.

These multi-API, multi-model, globally distributed, massively scalable data solutions are the future of cloud computing.

If you have a cloud deployed application, I urge you to check these out.

Call us on 020 3137 3920 to find out how we can help