// A visual guide to
Distributed
Systems
Modern software doesn't run on one machine. It runs on thousands — connected, collaborating, and constantly failing. This page teaches you how distributed systems work through live 3D graphics. Scroll to explore each concept.
Concept 01
CAP Theorem
A distributed system can simultaneously guarantee only two of three properties: Consistency, Availability, and Partition Tolerance. The 3D triangle above shows all three nodes — the red fault line represents a network partition.
C — Consistency
Every read receives the most recent write, or an error. All nodes see identical data at any point in time. Example: a bank balance must be exact.
A — Availability
Every request receives a response (not necessarily the latest). The system never refuses a read or write. Example: a DNS lookup always returns something.
P — Partition Tolerance
The system continues operating even when network messages are lost between nodes. All real distributed systems must tolerate partitions — so you choose C or A.
| Trade-off | System Type | Real Examples |
|---|---|---|
| CP | Consistent + Partition Tolerant | HBase, Zookeeper, Etcd, MongoDB (strong) |
| AP | Available + Partition Tolerant | Cassandra, DynamoDB, CouchDB, Riak |
| CA | Consistent + Available (no partition) | Traditional RDBMS on single node |
Concept 02
Consensus & Raft
How do distributed nodes agree on a single value even when some of them fail? The Raft algorithm solves this with leader election and append-only logs. The golden node above is the current leader; cyan nodes are followers casting votes.
Phase 1
Leader Election
A node becomes a candidate when it hasn't heard from a leader in a timeout window. It requests votes from peers. A majority win makes it the Leader. Others become Followers.
Phase 2
Log Replication
The Leader receives all client writes. It appends them to its log and sends AppendEntries RPCs to followers. Once a majority acknowledge the entry, it is committed.
Phase 3
Safety & Terms
Each election starts a new term. A leader with a lower term is immediately rejected, preventing split-brain. Leaders must have the most up-to-date log to win election.
Concept 03
Replication Strategies
Replication copies data across multiple nodes for fault tolerance and read scalability. The cyan node above is the Primary; purple nodes are Replicas. Watch data flow from primary to replicas — the red dot on one replica indicates replication lag.
Synchronous Replication
The primary waits for all replicas to acknowledge before confirming the write. Zero data loss, but higher latency. Used in: PostgreSQL synchronous standbys, Aurora.
Asynchronous Replication
The primary confirms the write immediately and replicates in the background. Lower latency but replicas may lag — if the primary crashes, recent writes can be lost.
Semi-Synchronous
Wait for at least one replica to acknowledge. Balances durability and performance. Reduces the blast radius of primary failure to a single replica lag window.
| Type | RPO (data loss) | Write Latency | Failover |
|---|---|---|---|
| Synchronous | 0 ms | High | Clean |
| Asynchronous | Seconds to minutes | Low | Possible data loss |
| Semi-sync | Bounded lag | Medium | Near-zero loss |
Concept 04
Network Partitions
A network partition occurs when nodes can no longer communicate with each other — even though the nodes themselves are alive. The red fault line above splits two sub-clusters. Packets (red dots) bounce back rather than crossing the partition.
Split-Brain
Each sub-cluster believes it is the primary and accepts writes — leading to conflicting data when the network heals. Detecting and resolving this is one of distributed systems' hardest problems.
Fencing & STONITH
"Shoot The Other Node In The Head" — ensure only one partition can write by physically fencing the other (power-cycling, removing network access). Prevents split-brain at hardware level.
Conflict Resolution
When partitions heal, conflicting writes must be merged. Strategies: last-write-wins, vector clocks, CRDTs (Conflict-Free Replicated Data Types), or application-level merge logic.
Concept 05
Observability
You can't fix what you can't see. Observability is the ability to understand the internal state of a distributed system from its external outputs. The three pillars — Logs, Metrics, and Traces — flow into the golden collector node above.
Logs
Timestamped, structured records of discrete events. Answer: "What happened?" Tool: Loki, Elasticsearch. Emit in JSON with trace-ID for correlation.
Metrics
Numeric measurements over time (counters, gauges, histograms). Answer: "How fast / how many?" Tool: Prometheus, Mimir. Drive dashboards and alerts.
Traces
Context-propagated, causally-linked spans across service boundaries. Answer: "Why was this slow?" Tool: Jaeger, Tempo, OpenTelemetry. Track a request end-to-end.
| Signal | Cardinality | Storage Cost | Latency Insight |
|---|---|---|---|
| Logs | High | High | Verbose, exact context |
| Metrics | Low | Low | Trends & rate |
| Traces | Medium | Medium | Request-level bottlenecks |
Concept 06
Service Mesh
A service mesh is a dedicated infrastructure layer for service-to-service communication. Each service (blue orb) has a sidecar proxy (gold orb) injected alongside it. All network traffic flows through these proxies — enabling mTLS, retries, circuit breaking, and traffic shaping without any changes to application code.
mTLS — Zero Trust
Mutual TLS ensures both sides authenticate each connection. Even inside a private network, every service proves its identity. Certificates rotate automatically.
Circuit Breaker
When a downstream service fails repeatedly, the proxy opens a circuit and stops sending requests — preventing cascade failures. Retries with exponential back-off are applied first.
Traffic Control
Split traffic by weight (canary deploys), by header (A/B testing), or by fault injection (chaos testing). All configured at the mesh level, not in service code. Tools: Istio, Linkerd, Cilium.
Concept 07
Consistent Hashing
Traditional sharding uses key mod N — any server change remaps almost every key.
Consistent hashing places both keys and servers on a circular ring so only 1/N keys
move when topology changes. Above: the white dot is a key traversing the ring clockwise, landing on
the first server it encounters. Colored arcs show each server's ownership range.
The Ring
Keys and servers both map to positions on a [0, 2π) circle via a hash function. A key is owned by the first server encountered clockwise from its position — no central coordinator needed.
Virtual Nodes
Each physical server owns multiple virtual tokens scattered across the ring. This evens load distribution and means a failure redistributes tokens across many neighbors — not just one — preventing hotspots.
Minimal Disruption
A joining server claims tokens only from its clockwise neighbor. A departing server passes its range to its successor. Only K/N keys move — no full reshuffle, no downtime.
Concept 08
Load Balancing
A load balancer distributes inbound traffic across a pool of backends so no single server becomes a bottleneck. Above: the purple Load Balancer receives requests from clients (blue) and fans out to backends (cyan). Health checks probe each backend — failing nodes are automatically removed from rotation without manual intervention.
Algorithms
Round Robin — rotate through backends sequentially.
Least Connections — route to server with fewest open connections.
Weighted — assign more traffic to higher-capacity servers.
IP Hash — same client always hits the same backend (sticky sessions).
L4 vs L7
L4 (Transport): routes on IP + port, no HTTP awareness — ultra-fast. Example: AWS NLB, HAProxy TCP.
L7 (Application): routes on HTTP headers, paths, cookies — enables canary deploys, A/B testing, URL-based routing. Example: AWS ALB, nginx, Envoy.
Health & Failover
LBs probe backends periodically (HTTP 200 on /health). N consecutive failures pulls a backend from the pool. Passive checks also monitor real-traffic error rates — catching slow-to-respond backends before they cascade.
| Algorithm | Best For | Limitation |
|---|---|---|
| Round Robin | Uniform stateless requests | Ignores server capacity |
| Least Connections | WebSockets, long-lived connections | Requires LB state |
| IP Hash | Session-dependent apps | Uneven on small pools |
| Random | Truly stateless microservices | No load awareness |
Concept 09
Message Queues & Event Streaming
Message queues decouple producers from consumers — a producer doesn't need to know who reads its messages, or if the consumer is even online. Above: producers (blue) push events into the Kafka broker (gold layered discs), which fans them out to consumer groups (cyan). Messages are persisted on disk and can be replayed from any historical offset.
Delivery Guarantees
At most once — fire & forget. May lose messages, no duplicates.
At least once — retry until acked. May duplicate, never loses.
Exactly once — idempotent producers + transactional consumers. Available in Kafka 0.11+. Highest overhead.
Consumer Groups
A consumer group distributes partitions across its members — each partition owned by exactly one group member at a time. Multiple independent groups can read the same topic simultaneously at different offsets, enabling fan-out to analytics, audit, and downstream services.
Queue vs Stream
Queue (RabbitMQ, SQS): push-based, messages deleted on consumption. Good for task distribution.
Stream (Kafka, Kinesis): pull-based, messages retained for days. Consumers track their own offset — supports replay, event sourcing, and time-travel debugging.