How to Copy Production Data into Staging for Better QA

Paulo Carvalho
4 min readMar 7, 2024

--

It will often reach a point in the development lifecycle of your product where you will need to have an accurate staging environment in order to properly QA your releases. Being able to create a clone of your production data can be quite helpful in these cases.

AI generated image from prompt “datacenter with copy and paste symbol”

Disclaimer: We are going over how to create a simple copy without anonymization of data or porting over of third-party integrations.

For this tutorial, we will be assuming you have a deployment similar to the one outline below.

Clone Application Database

In this series of steps we will create a new RDS instance from a snapshot of the production instance and alter some properties so it can be used with the staging server.

Step 1 — Create Production Snapshot

Select your production cluster and click “take snapshot”.

Select the production DB cluster and give your snapshot a memorable name such as <your-project>-production-clone-<date>.

Step 2 — Restore New DB Server from Snapshot

Select “Restore snapshot” to start the creation of your new RDS database with the production data.

On the next screen, configure your new DB cluster paying attention to the following attributes:

Instance type: Your staging instance will likely be much smaller than your production instance.

The VPC: This new instance should go in your staging VPC even though it will default to the production VPC.

Step 3 — [Optional] Configure Access to DB Server

Your new database cluster is likely inside a private subnet of a VPC. Therefore, if you want to be able to access it externally, you should configure your VPN, bastion or whichever other service you use to access.

In my case, I use Twingate in which you can create a new resource as shown below.

You can now connect to your new staging DB with the command below:

PGPASSWORD=YOUR_PASSWORD_HERE psql -h YOUR_HOST_NAME_PREFIX.rds.amazonaws.com -U YOUR_USER -p 5432 postgres

Step 4 — [Optional] Rename Database

The database name from within your new server will have been copied from production. Depending on the convention you follow, you may need to rename it as shown below.

ALTER DATABASE YOUR_OLD_DB_NAME_HERE RENAME TO YOUR_NEW_DB_NAME_HERE;

Step 5 — [Optional] Change Database Owner

Similarly to the name, you may want to change the user that staging will be using to access the DB so that the username and password for your production database is not made available to more developers than it should.

CREATE USER YOUR_NEW_USERNAME WITH PASSWORD 'YOUR_PASSWORD';
ALTER DATABASE YOUR_NEW_DB_NAME_HERE OWNER TO YOUR_NEW_USERNAME;

\c YOUR_NEW_DB_NAME_HERE

REASSIGN OWNED BY YOUR_OLD_DB_NAME_HERE TO YOUR_NEW_USERNAME;

Step 6— [Optional] Point App Servers to New DB

You now have a new working staging database that has a point in time snapshot of the production data. You can now configure your application servers to access the new database instead of the old one.

Clone S3 Object Store

Step 1 — Move Old Files

Instead of deleting the files you may currently have in staging, a less permanent solution is to move them into a folder such as _old. This is possible by running the command below:

aws s3 mv s3://YOUR_STAGING_S3/THE_FOLDER s3://YOUR_STAGING_S3/old_THE_FOLDER --recursive

Step 2 — Copy Files from Production S3 to Staging S3

Now that the S3 folder you need is cleaned of any data, you can copy over files from the production S3 to the staging S3 with the command below:

aws s3 cp s3://YOUR_PRODUCTION_S3/THE_FOLDER s3://YOUR_STAGING_S3/THE_FOLDER --recursive

Conclusion

In this article, we went over cloning a production database and object store into a staging environment.

If you need help with this process or are looking to perform a more complex cloning with data anonymization, external integrations and more feel free to email me at paulo@avantsoft.com.br.

--

--

Paulo Carvalho
Paulo Carvalho

Written by Paulo Carvalho

Want to chat about startups, consulting or engineering? Just send me an email on paulo@avantsoft.com.br.

No responses yet