Consistency models: pick the weakest one you can defend

August 20, 2023

“CAP theorem” is what most engineers remember from school about distributed consistency, and it is also the least useful thing to know about it. CAP tells you that during a network partition you must choose between consistency and availability, which is true and almost never the question anyone is actually trying to answer. The real questions come earlier, while the network is healthy: what do we mean by consistency? There is not one answer. There are at least a dozen distinct consistency models, they differ in ways that matter for user-visible behavior, and they cost wildly different amounts to implement.

Most applications are paying for a stronger model than they need, or claiming to provide a stronger one than they actually do. Both are expensive, in different ways. The goal of this post is a working map of the important models — from the strongest to the weakest — and a heuristic for picking honestly.

The two axes that confuse everyone

Consistency models get confusing mostly because people conflate two separate properties.

Per-object consistency is about a single data item seen from multiple clients. If I write X=1 and then you read X, what can you see? The strongest answer here is linearizability: you see the most recent write, full stop, as if there were a single copy.

Multi-object consistency is about a group of reads and writes that need to hang together. If I transfer $100 from account A to B, can someone else observe A’s debit but not B’s credit? The strongest answer here is serializability: the group of operations behaves as if it ran alone, with no interleaving.

These are different dimensions. A database can be serializable without being linearizable, and vice versa. The pairing — linearizable and serializable — is called strict serializability, and it is what most people mean when they say “strongly consistent.” Distinguishing the axes is the first step to having an honest conversation.

Linearizability

Linearizability is the strongest per-object model. Each read and write on a single object appears to take effect atomically at some moment between its start and its completion, and that ordering is consistent with real time. Once a write completes, every subsequent read — from any client, anywhere — sees that write or a later one. There is no window in which different clients see different values.

This is the model most developers assume by default. It is the model a single-node database provides for free. In a distributed system it is expensive: it requires consensus (Raft, Paxos, or equivalent) on every write, and consensus requires a quorum of nodes to agree. That is a round trip, usually across a network, for every operation.

Systems that provide linearizability advertise it as “strong consistency”: CockroachDB, Spanner, etcd, ZooKeeper. You pay for it in latency and availability — a partition that cuts off a quorum stalls writes — and the bill is real. Use linearizability for the data that genuinely requires it (leader election, configuration, distributed locks, financial transactions where correctness is unconditional) and notice that most application data is not on that list.

Serializability

Serializability is the strongest multi-object model. A set of transactions — each containing multiple reads and writes — is serializable if there is some sequential ordering of them that produces the same result. The transactions appear to run one at a time, in some order, with no overlap.

Note what serializability does not require: it does not require that order to match real time. Two transactions that happened simultaneously might be serialized in either order. A transaction that started first might be ordered after one that started later. Serializability says “there exists a serial order”; it does not say “the serial order matches the wall clock.”

That is why serializable ≠ linearizable. A database can give you serializable transactions while still returning a stale read to a client that just wrote — the transaction is serializable, but it was ordered before the write. Whether this matters depends on the application; for most business workflows, it does not.

Strict serializability combines the two: transactions are serializable and the serial order respects real-time order. This is what Spanner advertises as “external consistency” and what most people imagine when they hear “ACID” — though the A, C, I, D of a single-node database is strict serializability by default and only becomes interesting in the distributed case.

Snapshot isolation

Snapshot isolation is the most widely deployed isolation level that is not serializable, and the one most production systems actually run under, regardless of what their documentation claims.

Under snapshot isolation, each transaction sees a consistent snapshot of the database as of its start time. Reads within the transaction are all from that snapshot; writes become visible atomically at commit. Two concurrent transactions that write to the same row will conflict (one will be aborted), but two concurrent transactions that write to different rows will both succeed — even if a serializable execution would have required one of them to see the other’s write.

The classic anomaly is write skew. Two doctors are on call. Each one independently checks “is at least one other doctor on call?” — sees yes, sees the other — and each marks themselves off call. Both commits succeed. Now zero doctors are on call. The invariant that should have held did not, because neither transaction’s write conflicted with the other’s read, and snapshot isolation only prevents write-write conflicts.

Postgres’s default is read committed (weaker than snapshot isolation). Its REPEATABLE READ level is snapshot isolation. SERIALIZABLE is genuinely serializable (via SSI — serializable snapshot isolation), at some runtime cost. MySQL’s REPEATABLE READ is snapshot isolation but does not prevent all phantom reads. Oracle’s SERIALIZABLE is actually snapshot isolation, not serializable — a naming choice that has cost the industry uncountable bugs.

If you rely on an isolation level, verify what your database actually provides at that level. Do not trust the name.

Causal consistency

Drop down from linearizability and you enter the world of weaker models that are still useful. Causal consistency is the most important of them.

A causally consistent system preserves the order of operations that could have caused each other. If I post a comment, and then you reply to my comment, anyone else who sees your reply must also see my original comment. The system does not guarantee a total order — two unrelated posts may appear in different orders to different readers — but it guarantees that cause precedes effect, for any pair where causality exists.

Causal consistency is strong enough for most collaborative applications: chat, comments, document editing, shared feeds. It is weak enough to be implemented without consensus on every operation — typically with vector clocks, or by tagging each event with the identifiers of the events that caused it. The result is a system that can continue to serve reads and writes on either side of a network partition without violating causality, and that only becomes eventually consistent about unrelated operations.

This is the model most people actually want when they reach for “strong consistency” but do not want to pay the latency. Social feeds, chat apps, and most user-visible collaboration features are well-served by causal+ session guarantees.

Session guarantees

Sitting between causal and eventual are four weak but practically crucial guarantees, often called the session guarantees:

Read-your-writes. If I wrote X, my subsequent reads of X see at least that write. Different clients have no such guarantee between them.
Monotonic reads. My reads never go backward in time. If I saw X=5, I will not subsequently see X=3.
Monotonic writes. My writes are applied in the order I issued them.
Writes follow reads. A write I issued after reading a value will be ordered after the value I read.

Each of these is weaker than causal consistency but individually cheap to implement — usually with session affinity (pin a client’s requests to a specific replica) or with version tokens passed between client and server. Systems that feel “broken” — a user updates their profile and the next page still shows the old name — are usually systems that have not implemented read-your-writes. Fixing that one guarantee is often the difference between “feels buggy” and “feels fine,” even if the underlying model stays weak.

Dynamo-style systems, DynamoDB by default, Cassandra, Riak, and eventually consistent stores generally allow the client to turn these on selectively, because they are cheap and they cover the most common user-visible failure modes.

Eventual consistency

The weakest useful model. All replicas, given enough time with no new writes, converge to the same value. That is it. It says nothing about how long “eventually” is, nothing about the order in which writes become visible, nothing about whether a client sees its own writes next millisecond.

Eventual consistency is what you get by default with asynchronous replication, with most event-driven architectures, with multi-region DNS, with CDNs. It is useful for read-heavy workloads where staleness of a few seconds does not matter: product catalogs, static content, analytics dashboards, social-feed backfills, cached recommendations.

The mistake is to ship an eventually consistent system without knowing it. A service with a database and a Redis cache is eventually consistent — the cache can be stale — but most applications treat the combined system as strongly consistent until a bug proves otherwise. The honest move is to decide where in the system you are giving up which guarantees and to design the user experience around that choice.

What to pick

A rough decision order, from strongest defended to weakest:

Strict serializability. You are running a payment system, a ledger, a leader-election service, a uniqueness check at global scale. The cost of a linearizability violation is a correctness bug the business cannot absorb. Use a consensus-backed store; accept the latency.
Serializable transactions (local), eventual between services. You are running a typical business application. Within one service, transactions are serializable (often via the database’s default or its serializable isolation level). Between services, you are eventually consistent, coordinated by events and sagas. This is the microservices default, and it is the right default for most applications.
Snapshot isolation, with awareness. You are running a high-throughput OLTP workload where serializable isolation costs too much. Snapshot isolation is fine if you have audited the places where write skew can cause an invariant to fail and have closed them (with SELECT ... FOR UPDATE, explicit constraints, or serializable transactions for the critical ones).
Causal + session guarantees. You are running a collaborative application — chat, comments, documents, feeds — at scale. Causal consistency covers the correctness requirement; session guarantees cover the user-perception requirement.
Eventual. You are serving data that can tolerate staleness for the sake of availability and throughput. You have designed the UX around the possibility that two users see different things for a while.

The test, at each level: can I name the specific invariant that would break if I went one step weaker? If the answer is yes, stay. If the answer is no, go weaker. Stronger-than-necessary consistency is not free; it buys latency, cost, and operational fragility, and it encourages the rest of the system to assume guarantees that the edges cannot keep.

What CAP actually says

With the model map in place, CAP is small. During a network partition — when the nodes of a distributed system cannot all communicate — you cannot provide both linearizability and availability to every client. You have to pick one per request: stall (give up availability) or serve possibly stale data (give up linearizability).

That is a real constraint and it is not the one that usually matters. During normal operation, you have all three: consistency, availability, and partition tolerance (trivially, because there is no partition). During a partition, most systems are partially available for some operations and unavailable for others, and “consistency” means one of the dozen models above. PACELC is a better framing: during a partition (P), choose between availability (A) and consistency (C); else (E), choose between latency (L) and consistency (C). The second clause is what you are actually negotiating most of the time, and it is where the design decisions happen.

The rule, once more

Name the consistency model you are providing. Name the one your dependencies provide. Name the one your application assumes. When those three disagree, you have a bug — either in the code or in the documentation.

A system that claims to be strongly consistent and is actually snapshot isolated will corrupt invariants. A system that is actually eventually consistent but treated by its clients as synchronous will show users stale data in ways that look like bugs. A system that is linearizable when it only needed to be causal is spending latency it did not have to. Matching the model to the need is the whole discipline.

Pick the weakest one you can defend. Defend it with specifics — the invariant, the user expectation, the business rule — not with “we need it to be fast” or “we need it to be correct.” Those are slogans; the models are the vocabulary that makes them precise.