Media Storage
Media data (videos, images, etc.) consists of unstructured binary content. Storing such data in a traditional database is not ideal. Instead, it should be read and stored directly via a standard file system interface.
Media data is especially critical for client-facing services and often serves as the backbone of many applications, such as Netflix and YouTube. In this section, we will explore common approaches to managing media data effectively.
File Storage
File Storage refers to a system that exposes storage as a live file system, allowing independent scaling of compute and storage resources.
It is well-suited for high-performance and low-latency applications, enabling them to access complete files directly over the network. Moreover, multiple clients (either services or end-users) can share the same storage backend.
For example, two services share the same file storage system.
To improve availability and read performance, we can add replicas.
However, this setup resembles the Master-Slave model, where the primary server becomes a single point of failure, reducing overall service availability and write performance.
Object Storage
Although a relatively new technology, Object Storage has been rapidly gaining popularity.
The key advantage of Object Storage lies in its distributed architecture. Files, now referred to as objects, are independent and autonomously distributed across multiple servers. This results in a highly available and fault-tolerant system.
Let’s explore how Object Storage is implemented in practice.
Object Distribution
As discussed in the Peer-to-peer architecture, we aim to distribute objects across multiple servers. This allows concurrent writes and improves scalability.
Additionally, each server maintains replicas to increase fault tolerance. For example, two servers continuously replicate data to one another.
Object Naming
This distributed model complicates how we interact with objects.
Traditionally, we access files using a hierarchical path structure like /Team/Docs/doc.md
.
Without a centralized file system, there’s no inherent relationship between objects (e.g., directories or siblings).
To simulate the familiar file system structure, Object Storage systems often include the full path in the object’s key, for example:
This is merely a naming convention. Operations like listing files in a folder still require scanning across multiple servers.
Chunking
Simply distributing objects isn’t sufficient. Unlike typical database records, object sizes can vary dramatically. Large objects demand more storage and processing resources, which can lead to imbalances across servers.
Object Chunking
To address this, we can divide objects into fixed-size units called chunks. Chunking not only balances resource usage but also allows parallel read/write operations across servers.
For instance, if the configured chunk size is 100MB
,
a 200MB
object will be split into two chunks, which can be stored on different servers:
However, this approach has some trade-offs:
- Using small chunk sizes improves load balancing but results in too many chunks spread across servers. Retrieving an object then requires significant coordination and computing effort. For example, an object is distributed across four servers, requiring queries to all of them for retrieval.
- On the other hand, using large chunk sizes reduces the number of chunks but can create resource imbalances, small objects may be underutilized or ignored. For example, objects smaller than the chunk size are inefficiently stored on the same server.
Object Storage serves diverse clients with a wide range of file types, making it exceptionally difficult to define a single, optimal chunk size for all use cases.
Chunk Packing
To achieve better control, instead of slicing user objects arbitrarily, we define fixed-size system chunks and pack multiple objects into each chunk.
In this model, a chunk is a system-level file containing multiple objects. If an object exceeds the chunk size, it’s split across multiple chunks.
For instance, three objects are packed into two chunks, which are stored on two separate servers.
In practice, chunks are filled by appending object data until the chunk reaches capacity. This method is common in modern Object Storage solutions and will be used in the following sections.
Erasure Coding
To prevent data loss, we must replicate chunks across servers. A simple way is to duplicate each chunk to another server:
This basic replication results in 2x storage overhead.
Parity Blocks
Erasure Coding (EC) offers a more storage-efficient alternative.
For example, with 2 data chunks, we can mathematically generate 1 parity block. Conceptually, think of it as:
parity = chunk_1 + chunk_2
If one chunk is lost (e.g., chunk_2
), it can be recovered as:
chunk_2 = parity - chunk_1
With m
parity blocks, we can tolerate loss of up to m
chunks.
For example, with 3 data chunks and 2 parities, data remains safe even if any two servers fail:
In comparison, using full replication for the same level of fault tolerance would require 3 total copies per chunk:
Erasure Coding typically uses a parity-to-data ratio of 1:2, reducing storage overhead by roughly half compared to full replication. However, Erasure Coding introduces additional write latency, primarily due to the extra encoding and decoding operations required.
Metadata Server
Let’s move to the final aspect. In the Distributed Database topic, we routed a record to its owning server using a unique key.
However, in Object Storage, the situation is more complex. Object keys (or paths) are no longer central, as objects are bundled into system-managed chunks.
To manage an Object Storage cluster effectively, we need to introduce a dedicated Metadata Server in addition to the actual storage servers. This server is responsible for tracking where each object resides based on its key.
It can be implemented as a simple Key-value store, mapping keys to metadata like:
key -> [(server, chunk, position within chunk, size within chunk)]
.
For example, a file is mapped on the Metadata Server to its actual storage locations.
CDN (Content Delivery Network)
CDN plays a crucial role in delivering media content efficiently. In essence, a CDN is composed of two main components: Caching Layer and Backbone Network.
Caching Layer
A CDN functions as a read-through caching layer positioned in front of data sources.
For example, once a piece of data is initialized, it can be quickly retrieved from the CDN in subsequent requests:
Backbone Network
Typically, data is transferred over the public internet. However, long distances between endpoints result in many network hops and increased latency.
Behind the scenes, a CDN is built on an internal high-speed network, known as the Backbone Network. This network consists of dedicated fiber-optic links across regions, offering significantly much faster transmission than the public internet.
When a client connects to the CDN, their request is first routed to the nearest CDN server, which may then forward it internally to the target server:
Usages
There are two common misuses of CDNs:
Caching frequently updated data: CDNs are designed for caching static or infrequently changing content. For dynamic data that updates often, it’s better to query the source directly. If the source is geographically distant, consider using the Backbone Network to quickly replicate data to a nearby location:
Client in Vietnam Replica in Singapore Backbone Network Source in NA Replicate Query a nearby replica Local deployments: If content is only accessed within a limited geographic area (e.g., within one country), using a full CDN and Backbone Network might be overkill. A simpler, localized caching system may offer a more cost-effective solution.
Edge Computing
Edge Computing is a distributed computing model that builds upon the CDN’s Backbone Network. The key idea is to preprocess data at the closest possible server (Edge Server) before sending it to the main server (Origin Server).
This preprocessing can include operations like compression, filtering, or aggregation.
Edge Computing dramatically reduces both bandwidth usage and latency by optimizing data closer to the client before transmission.