Basics of Migrating to Object Storage

Amey Anekar
4 min readDec 29, 2020
Source: https://unsplash.com/photos/QsOEYVZvUiI

Storing unstructured data in the form of files is a basic requirement for all web applications. This data can include images, documents, videos, music, etc. Traditionally, application developers store files like these on the server’s local filesystem.

It fulfils the urgent requirement, but scaling up with a storage model like this can be difficult. There are a few caveats to this method of file storage.

Let’s consider this situation: You are hosting a web application which is running on a Linux server on the cloud. Your application requires users to upload a profile picture and resume. You have been storing these user documents on the local filesystem in the following directory structure:

docs/ ├── username1/│   ├── profile.png│   ├── resume.pdf├── username2/│   ├── profile.png│   ├── resume.pdf├── username3/│   ├── profile.png│   ├── resume.pdf

Your application has gained sudden traction and you have already resized the server’s storage volume to accommodate the sudden spike in new user uploads. But, how long will you keep resizing? Moreover, concurrent reads and writes to the local disk will hamper the server’s performance resulting in a degraded performance for your application users.

In such a situation, a cloud object storage solution is the most optimal option.

Basics of Object Storage

Object storage allows you to save unstructured data in a scalable manner. Technically, the difference between an object storage system versus a filesystem storage is that object storage manages your data as binary objects and stores them in a flat structure, whereas filesystems store your data in a hierarchy.

This hierarchy is not present in object storage. This not only makes it easier to retrieve data (using its metadata) but also much easier to scale up for your data requirements.

When using object storage, it is common convention to store your files with forward slashes in the file name, or commonly known as the ‘key’ in object storage parlance.

Say for example, you want to move the local file docs/username1/profile.png to object storage. To maintain the intuitive directory structure, all you have to do is set the key of the file as ‘docs/username1/profile.png’, when uploading it.

Technically, the path separators don’t make much of a difference for the object storage provider. The file will just be stored in a flat manner adjacent to rest of your files.

However, if you decide to browse these files through the AWS’ web interface, it shows a friendly directory structure based on the path separators you have supplied. The web interface also provides an option to create a Folder. But it is only a virtual folder. Technically, this hierarchy is not maintained by the cloud provider when storing the files.

Move Your Files to an Object Storage System

Let’s take a look at an example for moving your local files to the cloud. AWS provides SDKs for most of the popular programming languages. We will be using Node.js for this example.

Steps to move your files to the cloud:

  1. Create an S3 bucket
  2. Get your AWS account credentials
  3. Initialize your project (npm init) and install the AWS SDK package (npm install aws-sdk)
  4. Write your code

Now you’ve created the bucket my-user-docs and want to move your local files to this bucket. The below code defines three functions:

  1. getFiles() => Fetches all files in a directory recursively
  2. uploadFile() => Uploads a file to S3
  3. uploadFolder() => Calls getFiles() to get file list and calls uploadFile() on them iteratively
const AWS = require("aws-sdk");const fs = require("fs");const path = require("path");const s3 = new AWS.S3({    secretAccessKey: "aws_secret_access_key",    accessKeyId: "aws_access_key_id"});const Bucket = "my-user-docs";// Fetch all files from a folder recursivelyasync function getFiles(dir) {    const dirents = await fs.promises.readdir(dir, { withFileTypes: true });    const files = await Promise.all(dirents.map((dirent) => {        const res = path.resolve(dir, dirent.name);        return dirent.isDirectory() ? getFiles(res) : res;    }));    return Array.prototype.concat(...files);}// Call getFiles to fetch all files and pass them to upload fileasync function uploadFolder(folderName, folderPath) {    const files = getFiles(folderPath);    files.forEach(async (filePath) => {        const fullFilePath = path.join(folderPath, filePath);        const file = fs.readFileSync(fullFilePath);        await uploadFile(            `${folderName}/${filePath}`,            file        );    });}// Main upload logic to upload the file to S3 bucketasync function uploadFile(Key, Body) {    const params = {        Bucket,        Key,        ACL: "private",        Body      };    const res = await s3.upload(params).promise();}uploadFolder("docs", "/var/www/docs");

To access the uploaded files from object storage, you can use the getObject() method provided by AWS SDK:

const s3 = new AWS.S3({    secretAccessKey: "aws_secret_access_key",    accessKeyId: "aws_access_key_id"});const options = {    Bucket: 'my-user-docs',    Key: 'docs/username1/profile.png',};s3.getObject(options, function(err, data) {    // data.Body contains your file's contents    // You can save it the local filesystem using fs    // Or send it in response to an HTTP request using    // res.send(data.Body)});

Additional Advantages of Object Storage

In addition to being scalable, some more advantages of the Object Storage model are:

  1. Data Versioning: you can maintain older versions of a file even after it has been deleted or replaced.
  2. Fine-grained Access Control: you can configure ACLs for your bucket providing you better control over security of your data.
  3. Encryption: most cloud storage providers provide encryption capabilities for your files.
  4. Reduced costs: it’s cheaper to store data on an object storage system versus on a server’s filesystem.

--

--

Amey Anekar

Security Analyst aka Triager @HackerOne. Curious. Minimalist.