Why Backup Strategies Matter


Data loss happens. Hardware fails, software bugs corrupt data, human errors delete tables, and ransomware encrypts databases. A robust backup strategy is your last line of defense. Recovery ability, not backup creation, is what matters.


The RPO and RPO Framework


| Metric | Definition | Example |

|--------|------------|---------|

| RPO (Recovery Point Objective) | Maximum acceptable data loss | 1 hour = lose at most 1 hour of data |

| RTO (Recovery Time Objective) | Maximum acceptable downtime | 4 hours = must recover within 4 hours |

| WRT (Workload Recovery Time) | Time to make application functional | RTO includes WRT |


Setting Targets


| Application Type | RPO | RTO |

|-----------------|-----|-----|

| Banking transaction system | < 1 minute | < 5 minutes |

| E-commerce platform | < 5 minutes | < 1 hour |

| Content management system | < 1 hour | < 4 hours |

| Analytics/reporting | < 24 hours | < 24 hours |

| Development/staging | Best effort | < 48 hours |


Backup Types


Full Backup


A complete copy of the entire database. Slowest to create, fastest to restore.



# PostgreSQL full backup

pg_dump -h localhost -U app_user -F c -f /backups/daily/full_$(date +%Y%m%d).dump mydb



# MySQL full backup

mysqldump -h localhost -u app_user -p mydb > /backups/daily/full_$(date +%Y%m%d).sql



# MongoDB full backup

mongodump --host localhost --db mydb --out /backups/daily/$(date +%Y%m%d)


Incremental Backup


Backs up only data changed since the last backup of any type. Fastest to create, slowest to restore.


Differential Backup


Backs up data changed since the last full backup. Faster to restore than incremental.


| Metric | Full | Incremental | Differential |

|--------|------|-------------|--------------|

| Backup size | Largest | Smallest | Medium |

| Backup time | Slowest | Fastest | Medium |

| Restore time | Fastest | Slowest | Medium |

| Frequency | Weekly | Daily/hourly | Daily |


Point-in-Time Recovery (PITR)


PITR allows restoring a database to any point in time by replaying WAL (Write-Ahead Log) segments.


PostgreSQL WAL Archiving



# postgresql.conf

wal_level = replica

archive_mode = on

archive_command = 'test ! -f /backups/wal/%f && cp %p /backups/wal/%f'

archive_timeout = 60  # Force WAL switch every 60 seconds



# Full base backup

pg_basebackup -h localhost -D /backups/base/$(date +%Y%m%d) -X stream -P


Restoring to a Point in Time



# 1. Restore base backup

cp -R /backups/base/20260511 /var/lib/postgresql/16/main



# 2. Configure recovery (recovery.conf)

restore_command = 'cp /backups/wal/%f %p'

recovery_target_time = '2026-05-11 14:23:45'

recovery_target_action = promote



# 3. Start PostgreSQL; it replays WAL to the target time

pg_ctl start


Automated PITR with pgBackRest



# pgbackrest.conf

[global]

repo1-path=/backups/pgbackrest

repo1-retention-full=4

repo1-cipher-type=aes-256-cbc

repo1-cipher-pass=secure_backup_password



[main]

pg1-path=/var/lib/postgresql/16/main



# Create full backup

pgbackrest --stanza=main --type=full backup



# Create incremental backup

pgbackrest --stanza=main --type=incr backup



# List backups

pgbackrest --stanza=main info



# Restore to specific timestamp

pgbackrest --stanza=main --type=time --target="2026-05-11 14:23:45" restore


Cloud Backup Strategies


AWS RDS Automated Backups



# Automated backups are enabled by default (7 day retention)

aws rds modify-db-instance \

    --db-instance-identifier mydb \

    --backup-retention-period 35  # Extend to 35 days max



# Manual snapshot

aws rds create-db-snapshot \

    --db-instance-identifier mydb \

    --db-snapshot-identifier mydb-pre-migration-snapshot



# Restore from snapshot

aws rds restore-db-instance-from-db-snapshot \

    --db-instance-identifier mydb-restored \

    --db-snapshot-identifier mydb-pre-migration-snapshot



# Point-in-time restore

aws rds restore-db-instance-to-point-in-time \

    --source-db-instance-identifier mydb \

    --target-db-instance-identifier mydb-restored \

    --restore-time "2026-05-11T14:23:45Z"


GCP Cloud SQL



# Enable PITR (binary logging)

gcloud sql instances patch mydb \

    --enable-bin-log \

    --backup-start-time=23:00



# Create backup

gcloud sql backups create --instance=mydb



# List backups

gcloud sql backups list --instance=mydb



# Restore

gcloud sql backups restore --restore-instance=mydb-restored \

    --backup-id=123456


Backup Verification


A backup that cannot be restored is worthless. Test your backups regularly:



#!/bin/bash

# Weekly restore test script



# Restore backup to test environment

pg_restore -d test_db /backups/daily/full_latest.dump



# Run integrity checks

psql -d test_db -c "SELECT count(*) FROM information_schema.tables;"

psql -d test_db -c "SELECT count(*) FROM users;"



# Compare record counts with production

prod_count=$(psql -d production_db -c "SELECT count(*) FROM users" -t)

test_count=$(psql -d test_db -c "SELECT count(*) FROM users" -t)



if [ "$prod_count" -eq "$test_count" ]; then

    echo "Backup verification PASSED"

else

    echo "Backup verification FAILED: count mismatch"

fi



# Clean up

dropdb test_db


Backup Automation



# Backup cron schedule

# Daily full backup at 2 AM

0 2 * * * /usr/local/bin/backup_full.sh

# WAL archiving every 5 minutes (handled by PostgreSQL)



# Weekly verification

0 6 * * 1 /usr/local/bin/test_restore.sh



# Monthly copy to cold storage

0 4 1 * * /usr/local/bin/archive_to_s3.sh


3-2-1 Backup Rule


| Rule | Explanation | Implementation |

|------|-------------|----------------|

| 3 copies of data | Production + 2 backups | Live DB + local backup + remote backup |

| 2 different media types | Different failure modes | SSD + tape/cloud storage |

| 1 off-site copy | Disaster recovery | S3, GCS, or another region |


Disaster Recovery Scenarios


| Scenario | Recovery Method | RTO |

|----------|----------------|-----|

| Accidental DELETE | PITR to just before the statement | 30 min |

| Table corruption | Restore full + PITR to last consistent state | 2 hours |

| Entire instance failure | Restore from latest full backup on new instance | 4 hours |

| Region outage | Cross-region replica promotion or backup restore | 1-8 hours |

| Ransomware | Restore from backup before encryption timestamp | 2 hours |


Summary


Define your RPO and RPO targets before designing backup strategies. Use full backups for weekly snapshots, continuous WAL archiving for point-in-time recovery, and verify backups regularly by testing restores. Follow the 3-2-1 rule for data protection, automate the entire process, and document recovery procedures so anyone on the team can execute a restore. A backup that has never been tested is not a backup; it is a hope.