Migrating Paperclip Image Assets from GCP Bucket to AWS S3 + Cloudfront

Migrating the storage layer for Rails Paperclip images need not be a hassle or lead to downtime.

Image for post
Image for post
Photo by İsmail Enes Ayhan in Unsplash

Introduction

Paperclip is an attachment management library for Rails that despite being deprecated is still widely used. In one of my recent projects working with Solidus, we had to migrate our image assets from Google Cloud Storage to Amazon Web Services S3 with a Cloudfront CDN. I noticed a lack of guides on how to accomplish this so I will outline it below.

Migration

In your environment configuration files (usually named development.rb, production.rb, etc) change your existing Paperclip configuration code to the one below:

config.paperclip_defaults = {
storage: :s3,
s3_region: ENV['AWS_REGION'],
s3_credentials: {
access_key_id: ENV['AWS_ACCESS_KEY'],
secret_access_key: ENV['AWS_SECRET_KEY']
},
s3_protocol: :https,
s3_host_alias: ENV['CLOUDFRONT_DOMAIN'].gsub('https://', ''),
use_accelerate_endpoint: true,
bucket: ENV['AWS_S3_IMAGE_BUCKET'],
s3_headers: {
'Cache-Control' => 'max-age=315576000'
}
}

The s3_host_alias informs Paperclip that images should be downloaded from the CDN URL instead of the S3 bucket directly. use_accelerate_endpoint may be useful for speeding up upload speed for users that are further from the bucket region. Setting a cache-control header is a good practice when working with a CDN.

In your Paperclip initializer (possibly in config/initializer/storage) add the following lines of code:

Spree::Image.attachment_definitions[:attachment][:path] = 'spree/images/:id/:style/:basename.:extension'Spree::Image.attachment_definitions[:attachment][:url] = ':s3_alias_url'

The first line above configures the file path used for finding the images within the storage layer. The second line sets paperclip to use the s3_host_alias we configured prior.

To copy over the images from the storage buckets to S3 we will be using the gsutil tool. You can install it based on the instructions here. To provide the S3 credentials to the tool you will need to create a ~/.boto file such as the one below:

[Credentials]
aws_access_key_id = AWS_ACCESS_KEY_HERE
aws_secret_access_key = AWS_SECRET_KEY_HERE
[s3]
use-sigv4=True
host=s3.YOUR_REGION_HERE.amazonaws.com

With the file created, run the following command to sync up the buckets:

gsutil -m rsync -r gs://YOUR_GCP_BUCKET/ s3://YOUR_S3_BUCKET/

I recommend running this command from a virtual machine in AWS.

Now that all your images are available in your new bucket, you can deploy the code from step 1.

Since some time may have passed between the copying over of images and the deploy of the code that uses the new storage layer, new images may have been added to the old storage. To address this, rerun the sync command from step 2. Since only assets that are not in the new bucket get copied over you will only copy recently added images.

Conclusion

The above migration process should be able to move all images from GCP to AWS and configure your server to use the new storage layer without downtime.

Written by

A curious minded engineer.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store