Talking SQL to NoSQL data stores (Part 2)
Talking SQL to NoSQL – Setting it up
In Part 1, we looked at what Cloud databases and NoSQL data stores are and why they are out there. The actual topic for this part is really Quest Software’s Toad for Cloud Databases which acts as a translator, making it possible to talk SQL to databases like Hadoop, Cassandra, SimpleDB, etc. that have no SQL engine at all.
Toad for Cloud Databases actually consists of two parts (you may also have a look at the architecture diagram):
- A “Data Hub” that does the translation job, i.e. it maps data structures from the NoSQL world to relational tables and translates SQL syntax to the NoSQL store’s specific API calls and vice versa.
- A graphical UI acting as client, connecting to the Data Hub, configuring the data hub and issuing SQL (depicted as a toad in the diagram 😉 )
As you can see, the data hub plays a central role in the whole concept. Technically spoken, it is a MySQL server. Why that? Because MySQL offers an easy way to plug in your own “storage engine”. Well-known and popular storage engines are e.g. MyISAM and InnoDB. Toad’s data hub uses its own, proprietary storage engine named “HUB”.
In this case, the name “storage engine” is a bit misleading because it stores no data at all, but only translates, i.e. it presents the NoSQL data structures as MySQL tables. I tend to call these “shadow tables” because they don’t actually contain any data, but just point to the data.
Querying a shadow table by an SQL SELECT makes the data hub pull the data from the NoSQL store, pipe it through the storage engine, thereby putting it into relational format and give back an SQL result set.
Issuing an SQL INSERT against a shadow table makes the data hub grab the values you want to insert, translate them into the NoSQL store’s format and execute the appropriate API calls. Similar for Updates and Deletes.
The data store can be connected to an arbitrary number of different cloud, NoSQL or even conventional databases (for conventional databases it uses a standard ODBC driver). The actual mapping is as follows:
- Each connected data store is mapped to one MySQL database in the data hub MySQL server.
- Each NoSQL data structure can be mapped to a shadow table in the corresponding MySQL database. The mapping can be influenced by the user, e.g. which MySQL table column maps to which HBase column/column family.
Thereby, the whole MySQL syntax is available for querying or manipulating data in the attached data stores (HBase, Cassandra etc.)
The data hub can run on Windows, Linux or inside an Amazon EC2 instance, given of course that it has connectivity to the data stores you want to connect to. If the data stores are in different network segments with no connectivity in between, different data hubs are needed.
If the data stores in question reside in Amazon EC2, e.g. SimpleDBs, it may be a good idea to have the data hub run in EC2 as well as this will keep network traffic and cost to a minimum (inter-instance network traffic is typically faster than between an EC2 instance and the internet, and Amazon doesn’t account for inter-instance network traffic inside the same EC2 region).
Registering the data stores and mapping the data structures into relational “shadow tables” can be done either with the Windows based graphical client (“Toad for Cloud Databases”) or with the Toad for Cloud Databases Eclipse plugin. Here is a screenshot how an HBase data store can be registered:
After finishing the registration and mapping of tables, any MySQL client can be used to access the shadow tables, and thereby the NoSQL data. So you could just use the standard MySQL command line. The Toad for Cloud Database graphical client offers a couple of useful features beyond pure connectvity and editing SQL, e.g. a graphical query builder, wizards for exporting, importing, comparing and syncing data as well as reporting tools.
In the next part, I will give some samples and use cases on how we can use this to build bridges between SQL databases and NoSQL stores.