Skip to content

Desaster Recovery in the cloud, Part 2

In Part 1 we talked about some general requirements for setting up a simple-to-use disaster recovery solution for Oracle databases.

Today, I built up the same inside the Amazon cloud, using different data centers in order to build up a robust DR solution.

I used Amazon’s EC2 (Elastic Compute Cloud) for that. If you are new to Amazon EC2, have a look into this earlier series of blog posts covering EC2 basics

Specifically I used two different availability zones inside Europe, called eu-west-1a and eu-west-1b. Think of these as two independent data centers run by Amazon, both located in Europe. The same scenario would work with e.g. eu-west-1a and us-east-1a, another Amazon data center located somewhere in Virginia, USA. Or we could even use different cloud providers to work around major issues concerning a single provider.

While I could have set up my own Oracle Home installation, for ease of use I preferred one of Oracle’s pre-built Amazon Machine Images. A list of these is available at, and I used the one with AMI ID ami-8d97bcf9, containing an Oracle Database 11g Release 2 ( Standard Edition – 64 Bit on Oracle Linux 5.4.

Setup is as follows:

  1. Created an EC2 security group (firewall ruleset) opening up ports 22 (SSH), 1521 (Oracle listener) and 8081 (HTTP for Dbvisit’s Web Console).
  2. Fired up two instances of AMI ID ami-8d97bcf9: one called PrimaryDB in availability zone eu-west-1a, one called StandbyDB in availability zone eu-west-1b.
  3. Reserved two Elastic IP addresses and assigned one of them to each instance.
  4. Logged in as root to the primary instance (the only login possible initially). I just followed the wizard to build up a database. Alternatively, it’s possible to just cancel the wizard and use Oracle’s DBCA to build a database.
  5. Logged in as root to the standby instance, followed the wizard, but answered “N” when it asked “Would you like to create a database now?” The standby database will be built later as a copy of the primary.
  6. Adjusted listener.ora file on both instances to listen on the Public DNS address of the elastic IP address, e.g. for the Elastic IP address
  7. Downloaded and installed the Dbvisit Standby software for Redhat Linux into both instances, under /u01/app/dbvisit.
  8. Established SSH public/private keys so that the oracle OS user can connect from each server to the other server without interaction

Then I browsed to Dbvisit’s Web console on the primary server, logged in and started the creation process for a standby database. This is basically four steps:

  1. Setup and configure the standby environment: The wizard asks for everything around the planned standby environment. After finishing will all the questions, nothings yet happens on the database, but a Dbvisit database configuration file (DDC file) is built.

    Configuring the standby environment

    Configuring the standby environment

  2. A few manual modifications in this DDC file were needed because of some specialties in the EC2 cloud:
    As the host name of the EC2 instance is not fixed, I instructed Dbvisit to use the Public DNS name instead of the regular host name. This is done by setting:
    HOSTNAME_CMD = /u01/app/dbvisit/
    with being a very small shell script containing:
    echo <Public DNS name of this server>
    As this Public DNS Name is only valid with all its components (Fully Qualified Domain Name), we need to set one more parameter:
  3. Then I created the standby database: One of the reasons I really like Dbvisit, it’s really just clicking a button, and it builds up the standby database!

    Creating the standby database

    Creating the standby database

  4. Schedule transfer and apply jobs, e.g. in a 5-minute interval let it transfer and apply archived logs.
    At the time being I wasn’t able to get that working in EC2 as the Web GUI got confused because of the non-fixed hostnames.
    I reported this issue to Dbvisit and already got feedback that they look into it and come back with a solution. As soon as this works, I will make an update to this post!

After setting up the service and the startup trigger as described in Part 1, let’s try to connect using this TNS entry which contains the Public DNS addresses of both instances:



Have fun trying out!





Desaster Recovery in the cloud, Part 1

There are lots of arguments for using the cloud, especially around infrastructure costs.

On the other hand, more often than not we have seen outages in the past where a cloud provider’s issue became a major issue, influencing clients that relied upon this cloud provider.

As an Oracle database guy, this makes me think about a good desaster recovery solution. Why not use the cloud’s benefit to cure its own downside? Normally a robust DR solution means you need at least two locations, i.e. two data centers. When it comes to SMB (small and medium businesses) this is normally not an option. But even for larger companies it may be a challenge to provide that much infrastructure for just a couple of highly critical systems.

Now, the cloud offers us an easy way to rent some infrastructure based in Europe, and some other infrastructure based in a second Europe data center, or one in the US, or in Asia. Setup an Oracle standby database replicating between those, have your clients failover between primary and standby database, and there you are.

Currently I’m working a lot with a DR solution called Dbvisit Standby. It’s similar to Oracle Dataguard, but from my point of view much easier to work with and – very important – it’s quite attractive from a pricing point of view and it’s not bound to Oracle Enterprise Edition. I explicitly mention that because Oracle licenses are typically not included in a cloud provider’s infrastructure pricing (except with Amazon’s RDS for Oracle Standard Edition One)!

DR without the cloud

As a preparation let’s have a short look on the infrastructure we need to do this on-premise, i.e. without any cloud involved:

  1. A primary database running on the primary database server.
  2. A standby database running on the standby database server (could be multiple standby DBs as well, but let’s keep it simple for now).
  3. A replication software, like Dbvisit Standby, Libelle DBshadow or – for Oracle Enterprise Edition only – Oracle Dataguard.
  4. Clients being aware of this and capable to automatically fail over and fail back between primary and standby.

Now, for today, I got exactly this running, using Dbvisit Standby as replication software:

Oracle Standby database with Dbvisit Standby

Oracle Standby database with Dbvisit Standby

The only thing I had to add manually was the last item (automatic client fail-over):

Step 1: Create and start a service on the primary database (this one gets automatically replicated onto the standby database):

    service_name => 'MYSERVICE',
    network_name => 'MYSERVICE',
    service_name => 'MYSERVICE'   );

Step 2: A DDL trigger which checks on database startup (including graceful switchover or activating of standby database) whether this database is the primary one and only then starts the service only, otherwise stops it. This trigger is automatically repolicated onto the standby as well:

CREATE OR REPLACE TRIGGER manage_clientconnectservice
after startup on database
   role VARCHAR(30);
   SELECT database_role INTO role FROM v$database;
   IF role = 'PRIMARY' THEN
   END IF;

That’s it. Now provide the following TNS entry to your clients:

       (ADDRESS=(PROTOCOL=TCP)( HOST=<primaryhost>)(PORT=1521))
       (ADDRESS=(PROTOCOL=TCP)( HOST=<standbyhost>)(PORT=1521))

DR inside the cloud

Now the only difference when trying to do that in a cloud is that primary and standby database run in an instance, e.g. inside the Amazon cloud.

So, my next post will describe how to set this up using two instances inside the Amazon EC2 cloud: primary in Europe, standby somewhere else.




Monitoring SQL Azure

Quest Software (the company I’m working for) just brought out a SQL Azure cartridge for its application performance management solution Foglight.

Other than Spotlight on Azure and Spotlight on SQL Server Enterprise (covering also SQL Azure) which are tools focused on the Azure platform or SQL Server/SQL Azure, Foglight is a much broader story, covering among others end user, application, database, OS and virtualization layer performance.

So, how does Foglight’s SQL Azure part look like?

For playing around, I used my existing Foglight environment covering on-premise Oracle and SQL Server. After importing the SQL Azure cartridge, the starting page looks like this:

Starting page in Foglight for Databases

Starting page in Foglight for Databases

Clicking on the SQL Azure button launched a short wizard with these steps:

  • Step 1: Choosing an agent manager: This is Foglight administrative stuff, determining where the remote collector agent is running. I used my local VM here. Alternatively, it could be deployed inside the Azure cloud.
  • Step 2: Providing an Azure instance and login credentials.
  • Step 3: Choosing one or more databases in the Azure instance.
  • Step 4: Validating connectivity and starting monitoring

After that, the starting page looks like this. You may have to wait for a few minutes until all parts are actually filled with data:

Foglight for Databases Starting Page

Foglight for Databases Starting Page

On the upper part, it’s the overall database environment, including current alarm status. The lower part gives a quick overview of the selected Azure database. While the components are quite self-explaining, most of them offer some more details when clicked onto.

The Home Page button at the lower right sets the focus to the selected Azure database:

Foglight view of one Azure database

Foglight view of one Azure database

Among others it shows the current alarms, availability, performance and storage capacity of the database, I/O from applications in the Azure cloud as well as from those outside (on-premise applications), and cost. Again, all of these item are clickable and lead to detailed drilldowns. The same drilldowns can be reached via the menu on the top.

Nearly all of the displayed information is gathered automatically from the Azure instance and/or database. Two optional pieces of information have to provided manually in the Administration drilldown (see next screenshot):

  • The Data Center and Windows Live ID, displayed in the upper left corner.
  • Pricing information: Based on provided storage price and price for outgoing and incoming I/O, both per GB, the total storage cost is calculated and displayed in the upper right corner.
SQL Azure Administration drilldown

SQL Azure Administration drilldown

Only thing I couldn’t get working was the pricing information. From the error messages it looked very much like an issue with my German Windows mixing up dot and comma on the price values. OK, it’s the very first version of the SQL Azure cartridge… nevertheless, I already filed a CR to Quest support on this.

So, all this looks quite cool for diagnostics. But Foglight also claims to cover monitoring and alarming. Around a dozen alarms are pre-defined and can be adjusted in the Administation drilldown:

  • Database Reaching Maximum Limit Capacity
  • Database Requires Change In Edition
  • Database Reaching Current Limit Capacity
  • Long Lock Running
  • Response Time
  • Connection Time
  • Instance/Database Unresponsive
  • Database Property Changed
  • Collection Status
  • Executions Baseline Deviation
  • Total CLR Time Baseline Deviation
  • Total I/O Baseline Deviation
Alarm Rules Management

Alarm Rules Management

If you want to give it try, you can start downloading here.

Have fun with it!

Scaling Oracle Databases on Amazon RDS

Easy and short-term scalability is one of the main arguments for cloud based solutions. If you can’t easily scale up and down, it’s not a cloud.
RDS (Relational Database Services) is Amazon Web Services’ DBaaS offering (Database as a Service), currently available for MySQL and Oracle databases.
I blogged about RDS in the past here and here.
Now, how does scaling work for an RDS based Oracle database?

Scaling vertically

Let’s start with scaling vertically: CPU and memory capacity are determined by the so-called “DB Instance Class”. E.g. a Large Instance (instance class db.m1.large) provides 2 cores with approx. 4.5 GHz in total and 7.5GB of memory. Here is an overview of instance classes available for RDS (ECU is defined by Amazon as the equivalent of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon Processor):

Small Large High-Memory Extra Large High-Memory Double Extra Large High-Memory Quadruple Extra Large
Memory 1,7 GB 7,5 GB 17,1 GB 34 GB 68 GB
CPU 1 ECU 4 ECU 6.5 ECU 13 ECU 26 ECU
Platform Linux 64-bit Linux 64-bit Linux 64-bit Linux 64-bit Linux 64-bit
Internal Name db.m1.small db.m1.large db.m2.xlarge db.m2.2xlarge db.m2.4xlarge

An official and up-to-date list of available instance classes can be found here.

If you look at your RDS instance in AWS Management Console, it shows you the current instance class as well as a “Modify” button at the top:

RDS overview in AWS Management Console, showing the current instance class

RDS overview in AWS Management Console, showing the current instance class

After clicking the Modify button it looks like this:

Changing instance class and other properties of an RDS instance

Changing instance class and other properties of an RDS instance

There are actually three ways available here for scaling up and down:

  • Changing the “Allocated Storage” allows us to provide more storage to the database. In this case it’s a scale-up only. Currently, there seems to be no way to scale-down, i.e. free up storage from deleted rows, deleted tables etc.
  • Changing the “DB Parameter Group”. This is not so obvious, but a DB Parameter Group is just a set of init.ora parameters, including db_cache_size, shared_pool_size, pga_aggregate_target, or – if you like automatic memory handling – 10g’s sga_target and 11g’s memory_target. So you could effectively resize the database memory by changing the DB Parameter group. Nevertheless, in practice there is not much reason for that. Because your DB instance is the only thing running (on the underlying EC2 instance), and Amazon pre-sets the memory related init.ora parameters to some default values depending on the used instance class, using nearly all available memory anyway.
  • Changing the DB Instance Class. This changes the available CPU and memory resources according to the table above. BUT: Be aware that this change needs a small downtime. Therefore we have to pick one of two altenatives: Applying the change later or applying it immediately. Later means: during the next maintenance interval which is also displayed here and could be as much as one week away. Applying the change immediately requires us ticking the checkbox “Apply Immediately”, and then it is an immediate reboot taking a couple of minutes until our database is available again, with new memory and CPU resources.

I’m not quite sure why Amazon always needs a reboot here. It’s probably due to restrictions in the underlying EC2 infrastructure which is based on Xen virtual machines. In the VMware world it’s definitely possible to at least scale-up CPU and memory resources online for newer versions of Windows and Linux. See here for a good description how to do that with Linux VMs.
I would definitely consider it an enhancement request to Amazon RDS to enable true online scalability for databases.

Also, there is no way to scale I/O capacity which is one of the most important resources for databases. That would be another enhancement request from my point of view.

Scaling via command-line tools

All actions described can also be scripted and issued by command line. For this we have to first download the RDS CLI tools, which are available in their latest version here. And here you will find a good starting point for setting up the tools.

Once installed, just give it a try:

$ # Scale-up to High-Memory Extra Large Instance during next maintenance interval
$ rds-modify-db-instance myrds1 --db-instance-class db.m2.xlarge
$ # Scale-up to High-Memory Extra Large Instance immediately (reboot)
$ rds-modify-db-instance myrds1 --db-instance-class db.m2.xlarge --apply-immediately
$ # Immediately scale-down again 
$ rds-modify-db-instance myrds1 --db-instance-class db.m1.large --apply-immediately
$ # Change storage to 150GB immediately
$ rds-modify-db-instance myrds1 --allocated-storage=150 --apply-immediately


Scaling horizontally

When it comes to scaling horizontally, we are talking about starting up additional or removing unneeded instances. This is a well-known technique for web servers and application servers where load balancers can handle the addition and removal of instances and make it transparent to the clients.

For databases, it’s a bit more challenging because databases are full of state. Just starting up a couple of database instances doesn’t provide any benefit as long as they are independent of each other.
One solution to this would of course be Oracle RAC. Currently, RAC is not available on Amazon Web Services, neither EC2 nor RDS, as there is no way for sharing storage between instances. Even if it would be available, it could provide scalability of CPU and memory resources only. No I/O scalability with RAC as there is always one single database with its storage underlying the different RAC instances.

Another way to scale CPU, memory AND I/O resources is already available for MySQL on RDS and may be available in the future for Oracle on RDS as well. This is called “Asynchronous Read Replicas”. By employing these, there is still one single instance for read-write access that is replicated into a number of read-only replicas. These replicas can be used for database-level load balancing, at least for read-only access. Currently, when trying this with an Oracle instance, we see an error message like the one in the screenshot below.

Trying to scale horizontally with read replicas. Not yet available for Oracle on RDS

Trying to scale horizontally with read replicas. Not yet available for Oracle on RDS

Nevertheless, there are hints that Oracle read replicas may be available in the future, possibly using Oracle Active Data Guard in the background.
Time will tell…

Monitoring Oracle databases running on Amazon RDS

An Oracle RDS instance differs from an Amazon EC2 instance running Oracle database in a couple of aspects, including:

  1. RDS allows for SQL*Net access only. While it is probably backed by an EC2 instance internally, there is no visibility of this backend, including there is no SSH access. Means, no agents or other server-side installations.
  2. An RDS instance has restricted access, SYSTEM or SYS login is not possible. One login with DBA role is provided, but even this DBA role is a reduced version from the standard DBA role, e.g. it lacks ALTER SYSTEM and a couple of other privileges.
  3. A couple of pre-defined licensing options are available, including (currently) Standard Edition One, Standard Edition and Enterprise Edition with Partitioning, Oracle Label Security, OLAP, Data Mining, Oracle Database Vault and Real Application Testing options.

So RDS is more of a challenge for all kinds of tools than EC2, especially for monitoring software. At the time being, no Oracle Enterprise Manager functionality is offered for RDS. Quest Spotlight and Foglight are not an option as they require SYS access (at least for installation). Quest Foglight Performance Analysis requires SYS access and server-side installation of a collector agent.

Load testing is not so much of an issue. As stated above, RAT (Real Application Testing) is available. Quest Benchmark Factory works as well, providing among others trace file replay and various TPC tests.

Amazon CloudWatch
The good news is that Amazon offers its own interface for some important metrics, called CloudWatch. For RDS based Oracle instances this is already included in the pricing, delivering 11 different metrics at one-minute interval granularity:

  • CPUUtilization: The percentage of CPU utilization of the underlying instance
  • DatabaseConnections: The number of database connections in use
  • FreeableMemory: Available RAM (Bytes) of the underlying instance
  • FreeStorageSpace: Available storage space (Bytes)
  • SwapUsage: Amount of swap space (bytes) used on the underlying instance
  • ReadIOPS: Average number of disk read I/O operations per second
  • WriteIOPS: Average number of disk write I/O operations per second
  • ReadLatency: Average amount of time taken per disk read I/O operation
  • WriteLatency: The average amount of time taken per disk write I/O operation
  • ReadThroughput: Average number of bytes read from disk per second
  • WriteThroughput Average number of bytes written to disk per second

The easiest way to access CloudWatch metrics is via AWS Management Console. After selecting a running RDS instance, it’s all on the Monitoring tab:

CloudWatch metrics for an Amazon RDS based Oracle database server

CloudWatch metrics for an Amazon RDS based Oracle database server

As soon as you have more than one RDS instance or want to monitor other parts of Amazon infrastructure as well or want to define alarms, the “Amazon CloudWatch” tab at the top of the console is your friend:

Amazon CloudWatch metrics overview

Amazon CloudWatch metrics overview

Alarms are threshold based and send out emails when the threshold is broken for a specified amount of time.

An alternative is to pull out metrics from CloudWatch and integrate them into your own monitoring solution. There is a couple of APIs for that, including some command line tools that can be downloaded at

Also have a look at this link which is a good starting point for the CloudWatch command line tools as well as for other CloudWatch APIs:

I installed the command line tools on a small Linux VM. Getting a list of available RDS metrics is as easy as follows:

$ mon-list-metrics --namespace "AWS/RDS“
CPUUtilization       AWS/RDS  {DBInstanceIdentifier=myrds1}
DatabaseConnections  AWS/RDS  {DBInstanceIdentifier=myrds1}
FreeableMemory       AWS/RDS  {DBInstanceIdentifier=myrds1}

Getting the average, minimum and maximum CPU utilization for RDS instance “myrds1” between 11:00am and 11:20am in five minute intervals is done like this:

$ mon-get-stats CPUUtilization --namespace "AWS/RDS" --statistics "Average,Minimum,Maximum" --dimensions "DBInstanceIdentifier=myrds1" --start-time 2011-09-13T11:00:00 --end-time   2011-09-13T11:20:00 –-period 300
2011-09-13 11:00:00  1.196   0.94  1.29  Percent
2011-09-13 11:05:00  1.266   1.17  1.41  Percent
2011-09-13 11:10:00  1.08    0.7   1.53  Percent
2011-09-13 11:15:00  0.844   0.7   1.06  Percent

Wrap some awk or sed handling around this, put it into a scheduler of your choice, and you are ready to integrate with your own monitoring. I did this for Quest Foglight which allows defining your own dashboards, and it looks something like this:

Integrating Amazon RDS metrics from CloudWatch into Foglight

Integrating Amazon RDS metrics from CloudWatch into Foglight

I want to mention two challenges I faced:

  1. Each call of mon-get-stats has to be parameterized with an actual statistic (e.g. CPUUtilization) and a dimension (e.g. DBInstanceIdentifier=myrds1). Therefore it’s not possible to use one single call to read out all metrics. Instead, we need to issue one call for each RDS instance and for each statistic. I implemented this using a loop running essentially through the output of the first sample statement above (the mon-list-metrics).
  2. Each mon-get-stats call takes approx. 10 to 20 seconds, so running them synchronous in foreground is not a good idea. I got around this by running each mon-get-stats command inside the loop in background and have it write to an output file. While all the calls are running in background, a loop checks for running jobs and only moves on after all the background calls finished.

See below for my sample code (which outputs data as needed for Quest Foglight, but should be easily adjustable for different output formats required):

rdsinstancelist=`rds-describe-db-instances | awk '/DBINSTANCE/ {print $2}'`
starttime=`date -u --date='5 minutes ago' +%Y-%m-%dT%H:%M:%SZ`
endtime=`date -u +%Y-%m-%dT%H:%M:%SZ`
for instancename in $rdsinstancelist
  echo$instancename > results_$instancename.txt
  for metric in CPUUtilization DatabaseConnections FreeableMemory FreeStorageSpace ReadIOPS ReadLatency ReadThroughput SwapUsage WriteIOPS WriteLatency WriteThroughput
    echo $metric `$AWS_CLOUDWATCH_HOME/bin/mon-get-stats $metric --namespace "AWS/RDS" --statistics "Average" --dimensions "$dimension" --start-time $starttime --end-time $endtime -p 300` | awk '{print $1":"$5"="$4}' >> results_$instancename.txt &
numjobs=`jobs -r| wc -l`
while [ $numjobs -ne 0 ]; do
  sleep 5
  numjobs=`jobs -r| wc -l`
echo TABLE rdsperfdata
for instancename in $rdsinstancelist
  while read oneline
          echo $oneline
        done < results_$instancename.txt

Have fun trying out!

Monitoring Oracle databases running on Amazon EC2

I recently built an Amazon Web Services (AWS) based test environment for our team, containing among other things a couple of Oracle database instances.

Some questions that came to my mind after having finished with instance creation is:

  1. How to put some load onto it?
  2. How to measure what’s going on?

My first thought on both questions was: Just use any load generation, benchmarking or monitoring tool you like. You could start easy and try out Quest Software’s Benchmark Factory for putting some load onto the database, and then start monitoring with Quest Spotlight, Foglight, or Oracle Enterprise Manager Database Control, Grid Control, Cloud Control or whatever.

Well… we have to remember that there are two Oracle database offerings on AWS:

  1. Running an Oracle database inside an EC2 instance (Infrastructure-as-a-Service, see here)
  2. Running an Oracle RDS instance (Platform-as-a-Service, see here)

This blog post is about EC2 instances. Another post describes how to do monitoring for RDS instances.
An Oracle database inside an EC2 instance is just like any other database in a Xen-based virtual machine. No technological difference. Any software that just connects to the database, either as a standard user or as SYSTEM/SYS user, will behave normally. Therefore, things like SQL*Plus, Quest Toad, Spotlight, Foglight or even Benchmark Factory will just work.

Also, any software needing a server-side installation like Oracle Enterprise Manager or Quest Foglight Performance Analysis will just work.

BUT you have to consider this: If the software expects the database server to have a static IP address, you better keep an eye on that: Whereever an IP address or a DNS name is asked for during installation, use the EC2 instance’s “Public DNS Name” that is resolved to a so-called EIP (Elastic IP Address) associated with that EC2 instance!

Alternatively, deploy your EC2 instance into Amazon’s VPC (Virtual Private Cloud) and use a fixed IP address for the EC2 instance.

Having said that, look at the following screenshots.  If you are used to Oracle Enterprise Manager or Quest Spotlight, these should look familiar to you, and indeed there is no difference to it. It’s just a standard database server running somewhere inside Amazon’s data centers.

Just remember to open up SQL*Net port (e.g. 1521) and optionally SSH in Amazon’s Security Group configuration. If you plan to use Enterprise Manager Database Control, also open up port 1138 for accessing the Web Console.
And yet another hint: If you use another EC2 instance as Spotlight client or as monitoring server (e.g. OEM Grid Control), you have to open up the port, providing not a CIDR IP range, but the EC2 instance’s security group as source. See the next screenshot for what I mean…

Amazon EC2 Firewall settings for Oracle database access

Amazon EC2 Firewall settings for Oracle database access

Have fun trying out!

Speaking on UKOUG 2011 Conference

Q4 is going to be a conference quarter!

If you are on UKOUG 2011 (December 5th to 7th, Birmingham), feel free to walk into my presentation:

“For a Fistful of Dollars: Oracle on the Amazon Cloud”

It’s possible nowadays to run Oracle databases on the Amazon Cloud for a couple of dollars per month, including support and certification. Not all, but definitely some databases can benefit from this option. For which databases does this make sense? And how can customers as well as IT departments benefit from this? Patrick Schwanke, book author and Oracle ACE, shares his experiences and gives answers to these questions.

Probably taking place on Monday, 5th…