As is well known, there are two types of people: 1. People who do backup 2. People who will start doing backup.

Let us guess, you still have some project with your data or even data of your application users without a backup? Why you don't do it:

  1. You might think it will cost you a lot of extra money
  2. You don't want to spend time on it

The post will provide ready-to-use solution which will minimize both problems in one shot.

We are talking about S3 Glacier which is service inside of Amazon Web Services. According to Glacier pricing it costs only $0.004 per GB / Month in US regions. If your gzipped database takes 100Mb you can create a backup every week, and after a year you will collect ~5.2Gb of data so it will cost you only $0.02 per month after a year. Is it a good price for the peace?

With this price you will never want to delete backups - it is cheaper just leave them there forever. Glacier provides 99.999999999% durability of data which means you can always be calm - your data will be there. Glacier uses extra-cheap storages like magnetic tapes/compact disks, which are served by some robot and stored in special storerooms, that's why data retrieval time may take up to several hours.

So you know about Glacier, but how to start using it by spending minimal time?

The simplest way how to use Glacier for backups of any data:

1. Create an AWS account or log into existing. Then create a new user with programmatic access and permissions for S3 Glacier.

2. Download the glacieruploader binary. You can do it with wget:

wget https://github.com/MoriTanosuke/glacieruploader/releases/download/glacieruploader-0.1.1/glacieruploader-impl-0.1.1-jar-with-dependencies.jar

3. Create a Glacier vault in AWS Console. I intentionally selected us-east-1 region in AWS - even if you are far away from the US, backup speed is not important, but Glacier prices are cheaper in the US.

4. Create a bash script e.g. with nano /home/user/backup.sh:

#!/bin/bash

cd "$(dirname "$0")"

# enter any filename you want
filename=docker_volume_$(date +"%d_%m_%Y").tar.gz
echo "Doing backup $filename" >> /tmp/backup.log

# you can select a folder for backup here
GZIP=-9 tar zcf $filename /var/lib/docker/volumes/stackname_db-data/_data

# copy ACCESS_KEY_ID and SECRET_ACCESS_KEY pair from your newly created user
export AWS_ACCESS_KEY_ID=XXXXXXXXXX
export AWS_SECRET_ACCESS_KEY=YYYYYYYYYYYYYY

# make sure you have the same region in endpoint URL, as in your vault
java -jar glacieruploader-impl-0.1.1-jar-with-dependencies.jar --endpoint https://glacier.us-east-1.amazonaws.com --vault backups --upload $filename

echo "Doing backup $filename done" >> /tmp/backup.log

Set executable permission to file:

chmod +x /home/user/doBackup.sh

5. Use some scheduler e.g. cron, to run backup script periodically. For example to execute it at 05:00 every Monday, edit crontab file with crontab -e command and type:

0 5 * * 1 /home/user/doBackup.sh

Then save the file.

To recover backups use some utility, for example fastglacier.
Recover price for Standart Retrieval Time is ~$0.01 per GB, if you will restore one 100Mb backup will cost you 0.001$. Also Amazon charges to any requests to Glacier which fastglacier will do before downloading, requests cost $0.05 per 1,000, it will be also a very small charge.

Some recommendation to minimize backup expenses and time:

  1. When you develop your application keep application state separately and in one place (one or several folders). For example in Docker use volumes and backup only volumes. All codebase (stateless part) should be stored on the repository, not backed up.
  2. Use maximum GZIP compression level. In the script it is -9 which is the strongest compression.
  3. Calculate the size of your data and costs and select backups period appropriately