How to Save Costs on Non-Production SQL Server RDS Instances

By: Sadequl Hussain | Comments (5) | Related: > Amazon AWS RDS

Problem

We are trying to streamline our AWS resource expenses. The DBA team says we are running a number of RDS servers in our AWS accounts. Right now, we need those instances, so we can’t delete them. Is there any way we can save costs of those instances? Are there any quick wins we can have?

Solution

Amazon Relational Database Service (RDS) is a managed database service from Amazon Web Service. It offers a number of database platforms including Oracle, MySQL, PostgreSQL, MariaDB and Microsoft SQL Server. RDS has been widely adopted in the AWS community. Companies love RDS for its ease of deployment, high ROI and simple maintenance. With RDS, much of the overhead of building and maintaining a database infrastructure is gone.

However, RDS can also incur a large cost if not used properly. Companies with large database fleets in RDS can soon find their AWS monthly bill mostly consumed by the database service. Previously there wasn’t much anyone could do: non-essential RDS instances couldn’t be switched off after business hours like their EC2 counterparts. You could switch off a development EC2 server after 6:00 PM and turn it back on at 7:00 AM - this was not the case for RDS.

Fortunately, AWS now allows RDS instances to be shut down and restarted when necessary. This has been a much-awaited feature for RDS customers and no doubt will be embraced very quickly.

In this article, we will see how to shut down and restart RDS instances manually and how to automate the process. We will talk about the caveats of this feature and also some other methods you can consider when streamlining RDS costs.

Shutting down RDS Instances from AWS Console

It’s fairly straightforward to stop RDS instances from the RDS console. In the following image, we are seeing two RDS instances. One is stopped. The other (SQL Server) is running. We would like to stop the running instance. To shut down, we can select the instance from the RDS console and then select “Instance actions > Stop” from the top menu:

When shutting down an instance, RDS offers an option to make a snapshot of the instance. In our opinion, it’s always best to do this.

Similarly, we can start a stopped RDS instance by selecting the instance from the RDS console and choosing “Instance actions > Start” from the menu.

As you can imagine, this process is perfect for saving costs on non-production database servers. With this method, DEV or TEST instances can be shut down after business hours and started just before start of business. If an organization has a large number of non-production RDS instances, this can be a significant cost-saver.

Limitations

Like all good things, this feature comes with a price. As of this writing, an RDS instance can’t be shut down

If it has a read-replica (instances like MySQL, PostgreSQL) or is a read-replica itself
If it is using a multi-AZ deployment
If it is a SQL Server instance with mirrored databases (this is true for multi-AZ SQL instances)

Most RDS production environments should have a multi-AZ setup for failover purposes. This may not allow TEST or STAGING environments being shut down or restarted, because typically these environments are similar to PROD. Even if the systems are not needed after hours or after certain projects have gone live, they still can’t be stopped.

However, if there are simple development systems that don’t have read-replicas or don’t participate in multi-AZ, shutting those down during non-essential hours will be a wise idea.

Also, an instance can’t be kept shut down indefinitely. After an instance has been shut down for seven days, RDS will automatically start it. So, if an RDS instance has been shut down because it’s not needed for now, administrators need to ensure its shut down again as soon as it has come back online.

Automating the Stop and Restart Process

Stopping and starting RDS instances is simple when done from the console, but most DBAs and system administrators would want to put some sort of automation around this. This is where a little bit of scripting with AWS CLI can come handy.

AWS RDS CLI comes with two commands:

stop-db-instance
start-db-instance

The stop-db-instance command takes the name of an RDS instance as a mandatory parameter. There is also an optional parameter, db-snapshot-identifier. With this parameter, you can tell RDS to create a snapshot before the instance is shut down.

The start-db-instance command will take the name of a DB instance as the only mandatory parameter.

Both the commands output their results in JSON format. If the shutdown is successfully initiated, the JSON document should show a key-value pair of "DBInstanceStatus": "stopping". Similarly, if the start operation is successful, the JSON document’s DBInstanceStatus field should show “starting”.

If we try to shut down an instance that’s already stopped or in the process of stopping, we will receive an error message like this:

An error occurred (InvalidDBInstanceState) when calling the StopDBInstance operation: Instance <instance name> is not in available state.

Similarly, if we try to start a running instance, the error message will be like this:

An error occurred (InvalidDBInstanceState) when calling the StartDBInstance operation: Instance <instance name> is not stopped, cannot be started.

None of these commands are synchronous: this means the CLI only issues the command and outputs the response it receives from RDS. It doesn’t check if the shut down or start up is actually is proceeding or what stage the instance is in. We can parse the JSON document’ DBInstanceStatus field to find the status.

The code snippets below show two very simple bash scripts: one for stopping an instance (stop_rds.sh), the other for starting it (start_rds.sh). In both the scripts, we are taking a comma separated list of instance names as parameter. Note how we are creating instance snapshots with date and timestamp before stopping it. You can schedule both these scripts from cron jobs or scheduled tasks from a Jenkins server. One job can be schedule to run at say, 7:00 PM and the other at 7:00 AM.

stop_rds.sh

#!/bin/bash
 
instance_names=$1
region="us-east-1"
for db_instance in $(echo $instance_names | sed "s/,/ /g"); do
    db_instance_snapshot=${db_instance}"-snapshot-"`date "+%Y-%m-%dT%H-%M-%S"`
    stop_command="/usr/bin/aws rds stop-db-instance --region "$region" --db-instance-identifier "$db_instance" --db-snapshot-identifier "$db_instance_snapshot
    echo "Stopping RDS instance "$db_instance
    output=`$stop_command`
    echo "${output}"
done

The following command calls the script with two RDS instance names as parameters:

./stop_rds.sh data-dev-mariadb,data-dev-mysql

The output will look like this:

Stopping RDS instance data-dev-mariadb
{
    "DBInstance": {
        …
        …
        …
        "Endpoint": {
            "HostedZoneId": "XXXXXXXXXXXX",
            "Port": 3306,
            "Address": "data-dev-mariadb. xxxxxxxxxxx.us-east-1.rds.amazonaws.com"
        },
        "DBInstanceStatus": "stopping",
        …
        …
        …
        …
        "DBInstanceIdentifier": "data-dev-mariadb"
    }
}
…

start_rds.sh

#!/bin/bash
 
instance_names=$1
region="us-east-1"
for db_instance in $(echo $instance_names | sed "s/,/ /g"); do
    db_instance_snapshot=${db_instance}"-snapshot-"`date "+%Y-%m-%dT%H-%M-%S"`
    stop_command="/usr/bin/aws rds stop-db-instance --region "$region" --db-instance-identifier "$db_instance" --db-snapshot-identifier "$db_instance_snapshot
    echo "Stopping RDS instance "$db_instance
    output=`$stop_command`
    echo "${output}"
done

The following command calls the script with two RDS instance names as parameters:

./start_rds.sh data-dev-mysql,data-dev-mariadb

The output will look like this

{
    "DBInstance": {        …
        …
        …
        …
        "Endpoint": {
            "HostedZoneId": "XXXXXXXXXXXX",
            "Port": 3306,
            "Address": "data-dev-mysql.xxxxxxxxxxx.us-east-1.rds.amazonaws.com"
        },
        "DBInstanceStatus": "starting",
        …
        …
        …
        …
        "DBInstanceIdentifier": "data-dev-mysql"
    }
}
…

There are few points to be mindful of when running the automated process:

There are a finite number of RDS manual snapshots we can create in any region. If you need extra quota, you can always request a limit increase from AWS, but this is not a permanent solution. There needs to be a process of culling older snapshots. This ensures the instance stop process is not failing.
The EC2 machine running the cron job or the Jenkins server running the scheduled task needs to have proper permissions on the RDS instances in question. This can be done by assigning a role with appropriate permissions to the EC2 instance. The RDS instances’ security groups need to accept traffic from the EC2 instance.

Conclusion

Stop and start is a quick win for saving RDS costs. Other methods we can consider include:

Optimize instance sizes: You can make an audit of system resources in use. To do this, you can create CloudWatch dashboards with different RDS instances’ metrics. If CPU, disk space or memory for an instance isn’t saturated over a period of time (say six months), you can consider resizing it to lower specification. Similarly, unless necessary, multi-AZ setup shouldn’t be used for non-production environments.
Consolidate instances: Sometimes it’s better to use one large non-production instance than having multiple smaller ones. You can make an audit of databases from each non-production instance that can be migrated to a single instance. Once ready, the databases, users and jobs should be migrated to the new server. This process has one caveat though: database workloads from multiple systems may strain the consolidated server.
Delete “idle” instances: You may find there are instances with zero database connections for days, weeks or even months. Check with their stakeholders, make final snapshots and delete those instances.

Next Steps

Make a list of non-essential RDS instances in your AWS environment. Think what will be the best way to streamline their costs: should you stop and start them, should you delete some of them or should you consolidate them? Come up with a plan. Start small with only a handful of non-prod servers. Once you have some idea about the process, impact etc., you can repeat it for other instances too.

About the author

Sadequl Hussain has been working with SQL Server since version 6.5 and his life as a DBA has seen him managing mission critical systems.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips