Media Storage

Media Storage

Media data (videos, images, etc.) consists of unstructured binary content. Storing such data in a traditional database is not ideal. Instead, it should be read and stored directly via a standard file system interface.

Media data is especially critical for client-facing services and often serves as the backbone of many applications, such as Netflix and YouTube. In this section, we will explore common approaches to managing media data effectively.

File Storage

File Storage refers to a system that exposes storage as a live file system, allowing independent scaling of compute and storage resources.

It is well-suited for high-performance and low-latency applications, enabling them to access complete files directly over the network. Moreover, multiple clients (either services or end-users) can share the same storage backend.

For example, two services share the same file storage system.

File StorageService 1Service 2Docs/Images/doc1.mddoc2.mdimg.png

To improve availability and read performance, we can add replicas.

Primary File ServerReplica 1Replica 2

However, this setup resembles the Master-Slave model, where the primary server becomes a single point of failure, reducing overall service availability and write performance.

Object Storage

Although a relatively new technology, Object Storage has been rapidly gaining popularity.

The key advantage of Object Storage lies in its distributed architecture. Files, now referred to as objects, are independent and autonomously distributed across multiple servers. This results in a highly available and fault-tolerant system.

Let’s explore how Object Storage is implemented in practice.

Object Distribution

As discussed in the Peer-to-peer architecture, we aim to distribute objects across multiple servers. This allows concurrent writes and improves scalability.

Server 1Server 2img.pngdoc.md

Additionally, each server maintains replicas to increase fault tolerance. For example, two servers continuously replicate data to one another.

Server 1Server 2img.png (primary)doc.md (rep)doc.md (primary)img.png (rep)

Object Naming

This distributed model complicates how we interact with objects. Traditionally, we access files using a hierarchical path structure like /Team/Docs/doc.md. Without a centralized file system, there’s no inherent relationship between objects (e.g., directories or siblings).

To simulate the familiar file system structure, Object Storage systems often include the full path in the object’s key, for example:

Server 1Server 2/Users/Image/img.png/Team/Docs/doc.md/Team/Docs/README.md

This is merely a naming convention. Operations like listing files in a folder still require scanning across multiple servers.

Chunking

Simply distributing objects isn’t sufficient. Unlike typical database records, object sizes can vary dramatically. Large objects demand more storage and processing resources, which can lead to imbalances across servers.

Server 1Server 2doc.md (30KB)img.png (200MB)

Object Chunking

To address this, we can divide objects into fixed-size units called chunks. Chunking not only balances resource usage but also allows parallel read/write operations across servers.

For instance, if the configured chunk size is 100MB, a 200MB object will be split into two chunks, which can be stored on different servers:

img.png (200MB)Object StorageServer 1Server 2img.png.chunk_1 (100MB)img.png.chunk_2 (100MB)

However, this approach has some trade-offs:

  • Using small chunk sizes improves load balancing but results in too many chunks spread across servers. Retrieving an object then requires significant coordination and computing effort. For example, an object is distributed across four servers, requiring queries to all of them for retrieval.
img.png (500MB)Object StorageServer 1Server 2Server 3Server 4img.png.chunk_1 (100MB)img.png.chunk_2 (100MB)img.png.chunk_3 (100MB)img.png.chunk_4 (100MB)
  • On the other hand, using large chunk sizes reduces the number of chunks but can create resource imbalances, small objects may be underutilized or ignored. For example, objects smaller than the chunk size are inefficiently stored on the same server.
Object Storage (chunk size = 100MB)doc1.md (10MB)doc2.md (20MB)doc3.md (30MB)img.png (200MB)Server 1Server 2doc1.md.chunk_1 (10MB)doc2.md.chunk_1 (20MB)doc3.md.chunk_1 (30MB)img.png.chunk_1 (100MB)img.png.chunk_2 (100MB)

Object Storage serves diverse clients with a wide range of file types, making it exceptionally difficult to define a single, optimal chunk size for all use cases.

Chunk Packing

To achieve better control, instead of slicing user objects arbitrarily, we define fixed-size system chunks and pack multiple objects into each chunk.

In this model, a chunk is a system-level file containing multiple objects. If an object exceeds the chunk size, it’s split across multiple chunks.

For instance, three objects are packed into two chunks, which are stored on two separate servers.

Object Storage (chunk = 100MB)img1.png (50MB)img2.png (100MB)img3.png (50MB)Server 1Server 2chunk_1chunk_1img1.png.chunk_1 (50MB)img2.png.chunk_1 (50MB)img2.png.chunk_2 (50MB)img3.png.chunk_1 (50MB)

In practice, chunks are filled by appending object data until the chunk reaches capacity. This method is common in modern Object Storage solutions and will be used in the following sections.

Erasure Coding

To prevent data loss, we must replicate chunks across servers. A simple way is to duplicate each chunk to another server:

Server 1Server 2chunk_1chunk_2_replicachunk_2chunk_1_replica

This basic replication results in 2x storage overhead.

Parity Blocks

Erasure Coding (EC) offers a more storage-efficient alternative.

For example, with 2 data chunks, we can mathematically generate 1 parity block. Conceptually, think of it as: parity = chunk_1 + chunk_2

ℹ️
Here, the parity block is created by combining the two data chunks using a specific encoding operation (often XOR or addition).
chunk_1chunk_2parity+=

If one chunk is lost (e.g., chunk_2), it can be recovered as: chunk_2 = parity - chunk_1

chunk_1chunk_2parity=-

With m parity blocks, we can tolerate loss of up to m chunks. For example, with 3 data chunks and 2 parities, data remains safe even if any two servers fail:

Server 1Server 2Server 3Server 4Server 5chunk_1chunk_2chunk_3parity_1parity_2

In comparison, using full replication for the same level of fault tolerance would require 3 total copies per chunk:

Server 1Server 2Server 3Server 4chunk_1chunk_2chunk_1_replicachunk_2_replicachunk_1_replicachunk_2_replica

Erasure Coding typically uses a parity-to-data ratio of 1:2, reducing storage overhead by roughly half compared to full replication. However, Erasure Coding introduces additional write latency, primarily due to the extra encoding and decoding operations required.

Metadata Server

Let’s move to the final aspect. In the Distributed Database topic, we routed a record to its owning server using a unique key.

However, in Object Storage, the situation is more complex. Object keys (or paths) are no longer central, as objects are bundled into system-managed chunks.

To manage an Object Storage cluster effectively, we need to introduce a dedicated Metadata Server in addition to the actual storage servers. This server is responsible for tracking where each object resides based on its key.

It can be implemented as a simple Key-value store, mapping keys to metadata like: key -> [(server, chunk, position within chunk, size within chunk)]. For example, a file is mapped on the Metadata Server to its actual storage locations.

Object StorageMetadata ServerStorage Server 1Storage Server 2doc.md:  name: doc.md  size: 25MB  storage:  Server_1:    chunk_1:      position: 123      size: 5MB    chunk_2:      position: 0      size: 10MB  Server_2:    chunk1:      position: 0      size: 10MBdoc.md:  name: doc.md  size: 25MB  storage:  Server_1:    chunk_1:      position: 123      size: 5MB    chunk_2:      position: 0      size: 10MB  Server_2:    chunk1:      position: 0      size: 10MBchunk_1chunk_2chunk_1

CDN (Content Delivery Network)

CDN plays a crucial role in delivering media content efficiently. In essence, a CDN is composed of two main components: Caching Layer and Backbone Network.

Caching Layer

A CDN functions as a read-through caching layer positioned in front of data sources.

For example, once a piece of data is initialized, it can be quickly retrieved from the CDN in subsequent requests:

ClientCDNServer1. Request data2. Query the backend3. Cache4. Respond5. Request the data again6. Respond the cached data immediately

Backbone Network

Typically, data is transferred over the public internet. However, long distances between endpoints result in many network hops and increased latency.

Behind the scenes, a CDN is built on an internal high-speed network, known as the Backbone Network. This network consists of dedicated fiber-optic links across regions, offering significantly much faster transmission than the public internet.

When a client connects to the CDN, their request is first routed to the nearest CDN server, which may then forward it internally to the target server:

Client in VietnamBackbone NetworkServer in Southeast AsiaServer in North AmericaRequest to NAForward
ℹ️
Major CDN providers like AWS and Cloudflare operate their own backbone networks. Some large tech companies (e.g., Facebook, **Netflix*even build proprietary networks to optimize performance and reduce costs.

Usages

There are two common misuses of CDNs:

  1. Caching frequently updated data: CDNs are designed for caching static or infrequently changing content. For dynamic data that updates often, it’s better to query the source directly. If the source is geographically distant, consider using the Backbone Network to quickly replicate data to a nearby location:

    Client in VietnamReplica in SingaporeBackbone NetworkSource in NAReplicateQuery a nearby replica
  2. Local deployments: If content is only accessed within a limited geographic area (e.g., within one country), using a full CDN and Backbone Network might be overkill. A simpler, localized caching system may offer a more cost-effective solution.

Edge Computing

Edge Computing is a distributed computing model that builds upon the CDN’s Backbone Network. The key idea is to preprocess data at the closest possible server (Edge Server) before sending it to the main server (Origin Server).

This preprocessing can include operations like compression, filtering, or aggregation.

CDNClient in VietnamServer in Southeast AsiaServer in North America1. Nearest server2. Preprocess data3. Preprocessed data

Edge Computing dramatically reduces both bandwidth usage and latency by optimizing data closer to the client before transmission.

Last updated on