Talking SQL to NoSQL data stores (Part 1)
Cloud databases and NoSQL data stores – what’s it about?
What is a cloud database, and why is it supposed to be useful? Actually, the idea behind it is quite similar to what is behind the rest of the cloud computing story: Providing databases (and hopefully an approriate SLA specifying availability and service quality) on a self-service basis, provisioning a whole database server in minutes instead of days, paying per actual usage. The user (let’s call him consumer) doesn’t necessarily need to care about how the SLA is achieved, e.g. which hardware is used, things like clustering, replication, database or OS patching and upgrading.
Sometimes this is described as DaaS (Database as a Service), and it is a special case of PaaS (Platform as a Service). Just as a side note: this is a lot different from IaaS (Infrastructure as a Service), e.g. running an Oracle database inside an Amazon EC2 instance. In this case, the DBA still has to do all the mentioned stuff like patching, upgrading etc.
SQL Azure would be an example of a cloud database. It is very similar to SQL Server (actually it is based on SQL Server, accessed through a gateway that abstracts away any infrastructure related stuff, see here for an in-depth article from Kalen Delaney). Another example would be Amazon’s RDS which offers MySQL instances as service and – announced for Q2/2011 – also Oracle instances as service.
That said, what is NoSQL about? Well, another quality that is expected from a cloud is scalability or “elasticity”, i.e. easy scale-up and scale-down. Any application that needs to scale and that is running on top of a database, will only scale well if the underlying database does so. Imagine a given SLA that can be fulfilled by starting up one standard database instance (Oracle,SQL Server,DB2,Sybase, whatever you like) and one application server instance. Now, if your application gets a real big success story and you have to meet the same SLA with 200 times the load, you will probably need two things:
- Startup 200 application server instances AND have a good load balancing
- Startup 200 database instances AND have a good load balancing
The latter point is where NoSQL comes in. Traditional databases are just not well equipped for scaling this much. Their clustering is more about high availability (failover cluster) or – like Oracle RAC – partial load balancing as it means having several instances, but still one single database.
NoSQL data stores are not prominent for not talking SQL, this is just a side effect (actually NoSQL is mostly interpreted as “Not Only SQL”). They are prominent because of their virtually unlimited scaling capabilities. You need more power? Start up 100 additional instances. The NoSQL store’s engine will take care of things like replication, redundant storage, load distribution. Under the bottom line, this is a real compute cluster, or a compute cloud.
Main drivers for NoSQL data stores in the past have been the various social media applications we see today, but also massive parallel online gaming, and other portals with 5-digit up to 9-digit user numbers. Traditional databases just have a very tough time when trying to cover these requirements with an acceptable pricing.
So, from a developer’s point of view, NoSQL data stores are just the right for the job. They are specialized, lightweight engines with very simple APIs (typically no complex SQL engine), but with extreme scalability. Each NoSQL store is highly optimized for its respective purpose. Administration efforts are supposed to be very low, e.g. some of these systems don’t have a backup concept at all as they provide highly redundant storage anyway.
- HBase and Cassandra are typically called “Columns Stores” or “Wide Column Stores” where the data is physically stored column-wise, not row-wise like traditional database systems.
- Amazon’s SimpleDB, Microsoft’s Azure Table Storage and the well-known BerkeleyDB are so-called “Key Value Stores”, i.e. they store simple key-values pairs, similar to Windows registry entries.
- MongoDB or CouchDB have specialized on storing documents, specifically those in JSON format, so they are commonly called “Document Stores”.
- “Graph Databases” like Neo4J are not yet that widely adopted, but are optimized for storing and processing graph related data, e.g. for GPS systems.
This site gives a great overview on the NoSQL universe.
In the next part, I will show how to use Quest’s Toad for Cloud Databases to talk SQL to NoSQL data stores. As SQL depends on at least some relational data structure, this is easier for Column Stores and Key-Value Stores than for Document Stores. Therefore the former two types of systems will be our focus.