Desaster Recovery in the cloud, Part 1
There are lots of arguments for using the cloud, especially around infrastructure costs.
On the other hand, more often than not we have seen outages in the past where a cloud provider’s issue became a major issue, influencing clients that relied upon this cloud provider.
As an Oracle database guy, this makes me think about a good desaster recovery solution. Why not use the cloud’s benefit to cure its own downside? Normally a robust DR solution means you need at least two locations, i.e. two data centers. When it comes to SMB (small and medium businesses) this is normally not an option. But even for larger companies it may be a challenge to provide that much infrastructure for just a couple of highly critical systems.
Now, the cloud offers us an easy way to rent some infrastructure based in Europe, and some other infrastructure based in a second Europe data center, or one in the US, or in Asia. Setup an Oracle standby database replicating between those, have your clients failover between primary and standby database, and there you are.
Currently I’m working a lot with a DR solution called Dbvisit Standby. It’s similar to Oracle Dataguard, but from my point of view much easier to work with and – very important – it’s quite attractive from a pricing point of view and it’s not bound to Oracle Enterprise Edition. I explicitly mention that because Oracle licenses are typically not included in a cloud provider’s infrastructure pricing (except with Amazon’s RDS for Oracle Standard Edition One)!
DR without the cloud
As a preparation let’s have a short look on the infrastructure we need to do this on-premise, i.e. without any cloud involved:
- A primary database running on the primary database server.
- A standby database running on the standby database server (could be multiple standby DBs as well, but let’s keep it simple for now).
- A replication software, like Dbvisit Standby, Libelle DBshadow or – for Oracle Enterprise Edition only – Oracle Dataguard.
- Clients being aware of this and capable to automatically fail over and fail back between primary and standby.
Now, for today, I got exactly this running, using Dbvisit Standby as replication software:
The only thing I had to add manually was the last item (automatic client fail-over):
Step 1: Create and start a service on the primary database (this one gets automatically replicated onto the standby database):
BEGIN dbms_service.create_service( service_name => 'MYSERVICE', network_name => 'MYSERVICE', goal => DBMS_SERVICE.GOAL_NONE ); END; / BEGIN dbms_service.start_service( service_name => 'MYSERVICE' ); END; /
Step 2: A DDL trigger which checks on database startup (including graceful switchover or activating of standby database) whether this database is the primary one and only then starts the service only, otherwise stops it. This trigger is automatically repolicated onto the standby as well:
CREATE OR REPLACE TRIGGER manage_clientconnectservice after startup on database DECLARE role VARCHAR(30); BEGIN SELECT database_role INTO role FROM v$database; IF role = 'PRIMARY' THEN DBMS_SERVICE.START_SERVICE('MYSERVICE'); ELSE DBMS_SERVICE.STOP_SERVICE('MYSERVICE'); END IF; END; /
That’s it. Now provide the following TNS entry to your clients:
MYDB = (DESCRIPTION= (ADDRESS_LIST= (LOAD_BALANCE=OFF) (FAILOVER=ON) (ADDRESS=(PROTOCOL=TCP)( HOST=<primaryhost>)(PORT=1521)) (ADDRESS=(PROTOCOL=TCP)( HOST=<standbyhost>)(PORT=1521)) ) (CONNECT_DATA= (SERVICE_NAME=MYSERVICE) ) )
DR inside the cloud
Now the only difference when trying to do that in a cloud is that primary and standby database run in an instance, e.g. inside the Amazon cloud.
So, my next post will describe how to set this up using two instances inside the Amazon EC2 cloud: primary in Europe, standby somewhere else.