Peer-to-peer Architecture

Peer-to-peer Architecture

In terms of high availability and resiliency, the Master-slave model is not ideal because the master server holds too much centralized power.

By contrast, Peer-to-peer Architecture adopts a distributed model, where the system is operated cooperatively by multiple servers, each sharing equal responsibility. These servers are referred to as peers (or nodes).

Data Ownership

Conceptually, the database is divided into multiple shards, each managed by a peer.

For instance, a database might be split into three shards, each assigned to a separate server.

DatabaseShard 1Shard 2Shard 3Server 1Server 2Server 3

Each peer is responsible for managing a portion of the database. Even if some of them go down, only the respective shards become unavailable, while the rest of the system remains fully functional.

Importantly, we are not splitting the database into large storage blocks and distributing them across servers. Instead, a more granular approach is used: each individual record is assigned to a peer based on its unique key and a mapper function.

For example, consider a set of user records identified by userId (an integer):

  • Suppose the system has three servers, with serverId values from 0 to 2.
  • The mapper function is defined as: ownerServerId = userId % numberOfServers(3).
DatabasesServer 0Server 1Server 2User 0id0PKnameJohnUser 1id1PKnameFillipUser 4id4PKnameMikeUser 2id2PKnameAnn0 % 3 = 0 (S0)1 % 3 = 1 (S1)4 % 3 = 1 (S1)2 % 3 = 2 (S2)

Each server handles reading and writing for its corresponding records, ensuring improved availability. Additionally, when user keys increase linearly, storage volume is evenly distributed among peers, leading to efficient resource balancing.

However, this solution becomes problematic when the number of servers changes. Adding new servers requires updating the mapper function, and previously stored data may become unreachable because it was mapped using the old function.

In the example above, if we expand to four servers and change the mapper to userId % numberOfServers(4):

  • To locate User 4, we calculate 4 % 4 = 0 (Server 0), yet the data for User 4 was previously stored on 4 % 3 = 1 (Server 1).
  • Correcting this would require inefficiently rehashing and migrating the entire database.

Consistent Hashing

As demonstrated, traditional hashing tightly couples the number of servers to the data mapping, making it brittle during server changes. To address this, we use a technique called Consistent Hashing , which decouples records from the number of servers by mapping them onto a fixed, consistent range.

It’s easier to understand through an example:

Virtual Ring

First, we define a hash function that returns a value within a fixed range, [0, N]: map(value) -> output in [0, N]. This range is conceptualized as a ring, wrapping values from N back to 0.

For instance, we can use value % 100 to map values to [0 → 99], forming a circular virtual ring:

Consistent Hashing Ring

Placing servers

Next, we place servers onto the ring by hashing their server IDs. Each server occupies a specific, predictable point on the ring.

Placing Server on Ring

Placing records

We hash record keys using the same function and place them onto the ring.

Placing Record on Ring

To determine a record’s owner server, we scan clockwise (or counterclockwise) around the ring from the record’s position. The first server we encounter is assigned ownership.

Finding the Closest Server

Improvements

Does consistent hashing completely solve the rehashing problem? No, but it greatly mitigates the issue.

For example, when a new server (e.g., S0) is added, only a portion of the data from neighboring servers (S1 and S83) needs to be migrated, rather than the entire database.

Migration Example

Thus, consistent hashing avoids the need to remap the entire database when scaling out.

Virtual Nodes

One remaining issue with Consistent Hashing is imbalance. Real-world data distributions can cause hotspots: some servers become overloaded while others remain underutilized.

For example, S83 may be under stress with a large number of records, while the other servers remain idle.

Imbalance in Hashing

There are two main causes of the imbalance:

  • A poor hash function that fails to distribute values evenly.
  • A ring that is too sparse for the number of servers, creating large gaps.

To address these issues, servers can be assigned multiple virtual nodes on the ring.

Virtual Servers

Each physical server is represented by multiple virtual IDs, reducing gaps and achieving a more balanced distribution.

However, using too many virtual nodes increases system complexity, particularly during data migration. For example, when removing a server from the cluster, we must also migrate the data associated with all of its virtual nodes. This process can involve a large number of physical servers, significantly increasing the migration workload.

Shard Replication

Allowing shards to reside on only one server is risky. If that server crashes without recovery, the shard and its data will be lost.

Thus, we must introduce replication:

  • Each shard has one primary owner and multiple replicas stored on different servers. The number of replicas for each shard is commonly referred to as the Replication Factor.
  • Replica shards can independently serve read queries, enhancing both availability and performance.

For example, with 3 shards and a replication factor of 2:

  • Server 1 holds Shard 1 (primary) and Shard 2 (replica).
  • Server 2 holds Shard 2 (primary) and Shard 3 (replica).
  • Server 3 holds Shard 3 (primary) and Shard 1 (replica).
Virtually original databasePeer-to-peer clusterShard 1Shard 2Shard 3Server 1Server 2Server 3Shard 1 primaryShard 2 replicaShard 2 primaryShard 3 replicaShard 3 primaryShard 1 replicaReplicateReplicateReplicate

Replica Selection

A simple strategy is to pick the next servers clockwise on the ring.

Picking Replicas

Some systems strengthen this further by considering infrastructure diversity, for example, placing replicas across different data centers or regions to guard against localized failures.

Master-Slave And Peer-to-peer

Master-slave brings about simplicity. However, the master server becomes the system’s single point of failure, and reliance on it significantly degrades the system’s availability.

Peer-to-peer provides a more flexible and highly available cluster, making nodes inside a cluster equally important. However, maintaining consistency across peers becomes increasingly difficult as the network scales. For highly coupled data models like SQL , this approach can be challenging. Data is scattered across multiple servers, and actions like transactions or joins across many servers over the network become extremely costly and, at times, impossible.

In fact, many SQL databases treat the Master-slave model as their native setup. On the other hand, NoSQL databases, which avoid joins and transactions, use Peer-to-peer for high availability and fault tolerance.

Decentralized Cluster

We have extensively discussed data sharding and replication. Now, the question arises: how can we effectively combine them into a single virtual database?

A decentralized cluster must ensure that metadata (e.g., member addresses, sharding information, etc.) is both reliable and consistently shared across all members. This consistency is critical for enabling operations like replication, sharding, and promotion.

Peer 1Peer 2Peer 3

Distributed Properties

Before moving forward, we need to explore the CAP Theorem, a fundamental trade-off that governs distributed systems.

In essence, distributed systems are essentially characterized by three key properties: Consistency, Availability and Partition Tolerance.

Consistency (C)

Consistency in the context of CAP means that all nodes contain the same data, either immediately or eventually through synchronization mechanisms.

Availability (A)

Availability (A) ensures that every request receives a response, even if the response contains outdated or inconsistent data. In short, a system is considered available as long as it responds, regardless of accuracy.

ℹ️
We’ve only briefly touched on Consistency and Availability here; deeper explanations will follow in the sections below.

Partition Tolerance (P)

To understand Partition Tolerance, we first need to grasp the concept of a Partition:

Network Partition

A network partition occurs when failures split a cluster into isolated groups of nodes, preventing them from communicating with one another.

For example, consider a cluster of three servers that constantly cooperate to maintain synchronization:

Server AServer BServer C

Now imagine a network failure disrupts communication between Server C and the others: The cluster splits into two isolated partitions: Partition 1 (A, B) and Partition 2 (C).

Network partitionServer AServer BServer CPartition 1Partition 2Server AServer BServer CDisconnectDisconnect

Partition Tolerance (P) is a system’s ability to continue functioning correctly despite these network partitions.

CAP Theorem

The CAP theorem states that a distributed database can satisfy only two of the following three properties simultaneously: Consistency, Availability, and Partition Tolerance.

Thus, practical systems must choose between three design patterns: AP, CP, or CA.

CA System

A CA system provides Consistency (C) and Availability (A) but not Partition Tolerance.

In practice, this pattern is barely applied. When a network partition occurs, a CA system would either stop working entirely or behave incorrectly, both outcomes are unacceptable. Since network partitions are inevitable in real-world environments, a system that does not tolerate partitions is essentially unusable.

Thus, the real-world battle comes down to AP vs CP. In the presence of a partition, a distributed system must choose between Consistency and Availability.

CP (Consistency over Availability) System

Consider a cluster of two servers:

  • Server A hosts Shard 1.
  • Server B maintains a replica of Shard 1.
  • If clients write to Shard 1 via B, B forwards the request to A (the shard owner).
ClientClusterServer BServer AReplica 1Shard 11. Write to "Shard 1"2. Forward to the primary

Suppose a network partition occurs, separating A from B. Now, clients connecting to B can only read from the replica, writes are disabled to preserve consistency.

ClientClusterPartition 1Partition 2Server AServer BShard 1Replica 1DisconnectedRead and writeRead only

This is a CP system: it prioritizes Consistency over Availability, sacrificing write operations on isolated replicas.

AP (Availability over Consistency) System

Now, let’s modify the previous example to favor Availability. Instead of disabling writes, B temporarily accepts writes even while partitioned from A.

ClientClusterPartition 1Partition 2Server AServer BShard 1Replica 1DisconnectedRead and writeRead and write

In this AP system, partitions remain fully functional, but at the cost of Consistency: different partitions may accept conflicting updates.

ClusterClient 1Client 2Partition 1Partition 2Server AServer Buser:    id: 10    name: Johnuser:    id: 10    name: Johnuser:    id: 10    name: Doeuser:    id: 10    name: DoeDisconnectedWriteWrite

Important: Consistency here refers to cross-partition consistency during a network split, not the usual node-to-node replication consistency. Since partitions cannot communicate, inconsistencies persist until the cluster is healed.

Choosing between Consistency and Availability is a fundamental decision when designing a distributed database. In the following sections, we will explore two major approaches for managing decentralized clusters:

  • Gossip Protocol .
  • Consensus Protocol .
Last updated on