A recent report from IDG states unstructured data is growing at a rate of 62% per year, this presents several challenges:
- Scale: Traditional approaches to storage struggle to store more than a couple of PB effectively.
- Backup: As our data grows so does our backup window. How do we handle an increase of 62% every year in the time it takes to complete a backup?
- Cost: How do we store more data when the budget is growing at 5-10% or even shrinking!
To understand why object storage is different we first need to understand traditional storage methodologies, block storage and file storage.
Fundamentally, block storage operates by treating all the data as glorified zero's and ones. The storage is read and written in 'blocks' of these zeros and ones, block storage systems present and manipulate the storage in these blocks. Typically block storage is used for VMware farms and databases.
File storage abstracts block storage in a way that is similar to the way that humans access data. The files and folders we are used to dealing with every day are grouped and stored as such on the storage array. Each file is made up of the same ones and zeros as block storage and instead of being stored as fixed 'blocks' they are stored in groups making up each file. Information about each file is called metadata and is stored on a separate database in the file storage device. The challenge with file storage is as the system grows the metadata becomes a bottleneck. Every time someone wants to access a file the metadata table is used and as you scale it stops being able to cope. The other challenge is that if you want to have multiple copies of your file system or add more devices to your file system array you need a copy of the metadata on each device. This metadata needs to be kept in sync across all the devices which causes a further bottleneck.
Object storage stores the metadata with the actual file which removes the file system issues with metadata. The challenge with having the metadata stored with the file is that it makes it very slow to scan the metadata. For example, trying to scan an object storage system for all .mp3 files with Taylor Swift tagged as the artist would be very slow. (Yes, I like Taylor Swift. Don't judge me!)
The advantage of having no metadata means you can scale the system with hundreds of access nodes and store exabytes of data (1,000PB = 1EB). It's also much more cost effective to have an object storage system because you don't need expensive SSD's to store the metadata.
Since 1996 object storage has grown and matured into a force to be reckoned with. The OpenStack foundation has created the SWIFT access protocol and Amazon has popularised it's proprietary S3 protocol. File storage reaching its limitations in recent years object storage has blossomed with 12 & 18 products being respectively evaluated by Gartner and IDC.
Sometimes, we need to go back to basics to take a step forward. Object storage went back to basics 20 years ago and we now have a solution that can scale to exabytes, doesn't need to be backed up and is very cost effective.