Setting up automated PostgreSQL backups using Node and Shell script

March 26, 2019

In the old days, handling automated backups with a php based system was fairly easy, and often free, depending on your hosting provider. Some providers offered software on the server to handle backups (such as cPanel backup systems), but you could also write a script to handle a backup process yourself and customise the output. This post will give a quick, easy and free solution to backing up a PostgresQL database in Node, both in a development and production environment, and send the backups off to a remote location.

For the purposes of this post I’ll be using Heroku for my remote server which holds the database I’m backing up. This is easily substituted to any server environment such as AWS or Digital Ocean, as the only difference between the environment is how the cron job is initiated. More on that later though!

Using pg_dump

pg_dump is the key to setting up automated backups. It is a utility for backing up PostgreSQL databases, and is installed along with Postgres. I installed my Postgres on my Mac using Homebrew, but it comes along with the installation how ever it’s installed. Here’s a useful link to help with the installation if you are starting from scratch.

Once installed, you can go ahead and run the following command to backup the database:

pg_dump --username=<USER> <DATABASE>

If you get an error here that a database or user doesn’t exist, it means you need to select a database which is actually on your system. Simply run psql and then in the new entry interface, run \list which lists all databases.

This will dump the database to the terminal window. This is obviously not too helpful in the context of this post, but pg_dump can be configured to output to a file. Here’s the command to output to a file:

pg_dump --username=<USER> <DATABASE> -f ./<FILENAME>.tar -F t

The -f tells the system we want to output to a file. We then add the location and filename we want to be outputted, and finally the -F t tells the system we want the file to be in a specific format, and that format is a “tar” format (dictated by the t). This means it is automatically compressed and easily re-imported.

Once that is run, you’ll see the backup file with the name of your choice added to the current directory. Excellent.

Running the backup from a .sh file

As the project gets bigger you may want to add more logic to the export. Perhaps you want the export file to be named as the current date and time, or sent to a different location depending on a number of factors. This is much easier to handle if the backup script mentioned above is contained in a Shell file, which can be executed by Node.

To create a Shell file, run touch backup.sh. Open this file in your editor, and add the following line at the top:

#!/bin/bash

This shows the file is a bash file and should be executed as such. Add the pg_dump script inside this file, so the backup.sh file now looks like:

#!/bin/bash
pg_dump --username=<USER> <DATABASE> -f ./<FILENAME>.tar -F t

You can then run the file by running ./backup.sh in your terminal. As this is just a bash script, this will execute the command.

Executing the bash script from Node

I prefer to execute bash scripts in Node as I’m more comfortable writing logic in Node rather than in a Shell file using bash. Bash gets more complicated with loops and conditional statements, and it also looks ugly (sorry if I offended any Bash people out there).

Luckily, Node comes with ways to handle executing files as it’s needed quite often in Node. This using the child_process of Node, which has a few different methods to handle executing files and scripts. We’ll be focusing on execFile, as this is relevant to what we need here. I’ll do a post on the others soon as they are so useful.

First, install dotenv. If you don’t know what this is, it basically allows you to set environment variables for use within Node. Create a .env file in the root of your project, add it to your .gitignore, and you’ll then be able to access variables set inside this file. For example, if you add DB_NAME=mydatabase to the .env file, you’ll be able to access this database name within your Node project by using process.env.DB_NAME. This is required in order to set up backups for your remote database later on, but we’ll get on to that.

First, touch run_backup.js and open in your editor. Add the following to this new file:

// run_backup.js
require("dotenv").config()
const execFile = require("child_process").execFile
const date = new Date()
const current_date = `${date.getFullYear()}-${
  date.getMonth() + 1
}-${date.getDate()}-${date.getHours()}-${date.getMinutes()}`
const backup_file = `export_${current_date}`
const backup_file_ext = `export_${current_date}.tar`

let backup_script = `pg_dump --username=${process.env.DB_USER} ${process.env.DB_NAME}`

var script = execFile(
  `./backup.sh`,
  [backup_script, backup_file, process.env.DB_PASSWORD],
  (error, stdout, stderr) => {
    if (error !== null) {
      console.log(`exec error: ${error}`)
    }
    console.log("Backup complete!")
  }
)

All my database credentials are in my .env file as this is more secure (as it’s not added to version control). First off, the dotenv file is included which is required to access the environment variables. Then the child_process is required and assinged to a variable. The filename is then generated using the current date and time, and added to variables to ensure the backup file is identifiable and unique. Finally, the execFile function is used to target the backup Shell script.

The execFile function is the key here. The second argument is an array of variables which we are passed down to the Shell file. These variables are accessed within the Shell file using the following syntax:

#!/bin/bash
PGPASSWORD="${3}" ${1} > ./${2}

As you can see, the numbers correspond to the position in the array which is passed in to the execFile function.

One more thing to mention is that the database password is passed down to the backup.sh Shell file now. This is because when we move over to running this on the server, the password is needed within the Shell file.

The Shell file and run_backup.js file are both added to version control, so it’s important that the password and other details are not in version control, and instead handled by environment variables.

Now to process the backup, simply run node run_backup.js, and watch that lovely file appear in your directory.

Sending the backup file to a remote location

We don’t want to keep the backups on our local or remote systems most likely. Instead we want to send to a remote server which contains just the backups in order to keep them secure and in a useful place. This could be an S3 bucket or a server account.

First, install ssh2-sftp-client to handle to file transfer. Include it at the top of the page, and then use the client to connect to the server and send the file you just backed up:

// run_backup.js
const fs = require('fs');
const Client = require('ssh2-sftp-client');
let sftp = new Client();
...

let backup_script = `pg_dump --username=${process.env.DB_USER} ${process.env.DB_NAME}`;

var script = execFile(
  `./backup.sh`,
  [backup_script, backup_file, process.env.DB_PASSWORD],
  (error, stdout, stderr) => {
    ...

    sftp.connect({host: '', port: '', username: '', password: ''}).then(() => {
      sftp
        .fastPut(backup_file_ext, '/' + backup_file_ext)
        .then(() => {
          fs.unlink(backup_file_ext, () => {
            script.kill('SIGINT');
          });
        });
    });
  }
);

After adding your real credentials to your remote location, running the script again using node run_backup.js will process the backup, save the file to your local machine, send the same file up to the remote location, and finally remove the file from your local.

Running the backup from a remote location

It’s fine backing up your local data, but you may also want to back up the live/production application data too. As I mentioned earlier, we want to be using the same files on local and remote instances, and relying on environment variables to dictate which system we are backing up.

First we want to differentiate between our local and remote. We can do this by manually setting an environment variable for our environment. This is typically called NODE_ENV. Add the following line to your local .env file: NODE_ENV=development. Then on your remote server, you’ll want to add: NODE_ENV=production.

How you change your remote variables will depend on the hosting platform. I am using Heroku for this example, so my NODE_ENV is set in the Heorku dashboard under Reveal config vars in the settings tab of my project. You may need to manually add a .env file if running on AWS.

Then update your run_backup.js to contain the following final code:

// run_backup.js

require("dotenv").config()

const execFile = require("child_process").execFile
const fs = require("fs")
const Client = require("ssh2-sftp-client")
let sftp = new Client()

const date = new Date()
const current_date = `${date.getFullYear()}-${
  date.getMonth() + 1
}-${date.getDate()}-${date.getHours()}-${date.getMinutes()}`
const backup_file_ext = `export_${current_date}.dump`

let backup_script
if (process.env.NODE_ENV == "development") {
  backup_script = `pg_dump --username=${process.env.DB_USER} ${process.env.DB_NAME}`
} else {
  backup_script = `pg_dump --username=${process.env.DB_USER} -Fc ${process.env.DB_NAME} -h ${process.env.DB_HOST}`
}
var script = execFile(
  `./backup.sh`,
  [db, backup_file_ext, process.env.DB_PASSWORD],
  (error, stdout, stderr) => {
    sftp.connect(importConfig.sftpConnectionDetails).then(() => {
      sftp.fastPut(backup_file_ext, "/" + backup_file_ext).then(() => {
        fs.unlink(backup_file_ext, () => {
          script.kill("SIGINT")
        })
      })
    })
  }
)

The important note here is that the backup script now includes a process.env.DB_HOST variable. This is given to you when you add a Postgres database to Heroku, and is added to your config vars alongside the DB_PASSWORD and DB_USER.

Again, to run this on your remote or local system, simple cd to the directory of your JS file and run node run_backup.js. To access this on your Heroku server, first run heroku run bash in order to access the bash terminal of your Heroku instance.

The final step - scheduling the script

Finally, once your scripts are working on local and remote, it’s time to add functionality for the script to auto run every few hours or every day. This can be easily accomplished with Cron Jobs, or on Heroku, there’s an addon called Heroku Scheduler. Add this to your account, and add follow the instructions. The script to add will be node run_backup.js (as we did with our local), and it’s as easy as that!

Restoring from the backup

So you have the .dump file backed up to your S3 bucket, and your application goes down which requires a restore. Luckily this is easy! Here’s the command to restore to a local DB:

pg_restore -c -U [DB_USER] -d [DB] -v "./[FILE].dump" -W

This can be adapted to restore to your remote DB easily by performing the command on the server.

Improvements

One major consideration with backing up databases is security. This has not been a major part of this post, but in a production environment it’s worth encrypting your database before it’s sent via SFTP to your remote location. This will not only ensure it’s safe arrival to the remote location, but also mean it can’t be tampered with once it’s on the server. This can be accomplished using GnuPG (PGP), and is something I’ll look to cover in another post soon.

Senior Engineer at Haven