Serializable Snapshot Isolation - Part 1
2024-12-10
Snapshot Isolation
Let’s recap how Snapshot Isolations works first.
The concept of SI is quite simple: A transaction only see data that has been committed before it starts. Even if a newer version data has been committed(updated) by another transaction, it can not see this update. Unless, the update is done by this transaction itself. Upon commit, it will abort if any conflict detected.
Internal Tuple Fields
Everytime a transaction begins, an unique uint_32
monotonic increasing* transaction id (txid) will assigned to it.
Tuples (table rows) have some extra fields (columns) that is related to txid and normally not visiable by user, Here’s the fields that PostgreSQL used for SI:
t_min
: the transaction ID (txid) of the transaction that inserted this tuple.t_max
: the transaction that deleted or updated this tuple. 0 mean this tuple has not been deleted or updated yet.t_cid
: is the number of command executed within current transaction, start from 0.t_ctid
: current tuple identifier, a pointer points to itself or its update.
Next we take a look how t_min
, t_max
, t_cid
, and t_ctid
might change during typical CRUD operations in PostgreSQL.
Initial Setup
Let’s assume we start with a simple table:
CREATE TABLE example (
id SERIAL PRIMARY KEY,
value INTEGER
);
Initial state (no rows):
ctid |
value |
xmin |
xmax |
t_ctid |
Notes |
---|---|---|---|---|---|
(none) | (none) | (none) | (none) | (none) | Table is empty. |
Step 1: Insert (Create)
INSERT INTO example (value) VALUES (10);
- Action: A new row is inserted.
- Effect:
xmin
is set to the transaction ID of the insert,xmax
isNULL
(indicating the row is valid), andt_ctid
isNULL
(no new version).
ctid |
value |
xmin |
xmax |
t_ctid |
Notes |
---|---|---|---|---|---|
(0,1) |
10 |
100 |
NULL |
NULL |
Row inserted by xid=100 . |
Step 2: Read
SELECT * FROM example WHERE value = 10;
- Action: The row is read.
- Effect: No change to the internal metadata (
xmin
,xmax
,t_ctid
) because reads do not modify the row. Visibility check will be discussed later.
ctid |
value |
xmin |
xmax |
t_ctid |
Notes |
---|---|---|---|---|---|
(0,1) |
10 |
100 |
NULL |
NULL |
Row read. No changes. |
Step 3: Update
UPDATE example SET value = 20 WHERE value = 10;
- Action: The row is updated. PostgreSQL creates a new row version for the update, links the old row to the new row via
t_ctid
, and setsxmax
for the old row to the current transaction ID. The new row’sxmin
is the current transaction ID, andt_ctid
isNULL
for the new row.
ctid |
value |
xmin |
xmax |
t_ctid |
Notes |
---|---|---|---|---|---|
(0,1) |
10 |
100 |
200 |
(0,2) |
Old version marked deleted (xid=200 ). |
(0,2) |
20 |
200 |
NULL |
NULL |
New version inserted (xid=200 ). |
Step 4: Delete
DELETE FROM example WHERE value = 20;
- Action: The row is deleted. PostgreSQL marks the row as invalid by setting its
xmax
to the current transaction ID (300
). No new version is created, sot_ctid
remainsNULL
.
ctid |
value |
xmin |
xmax |
t_ctid |
Notes |
---|---|---|---|---|---|
(0,1) |
10 |
100 |
200 |
(0,2) |
Old version remains unchanged. |
(0,2) |
20 |
200 |
300 |
NULL |
Row marked deleted (xid=300 ). |
Behavior Recap
- Insert: Creates a new row with
xmin
set andxmax
/t_ctid
asNULL
. - Read: Does not modify
xmin
,xmax
, ort_ctid
. - Update: Marks the old version with
xmax
and creates a new version linked viat_ctid
. - Delete: Marks the row as invalid by setting
xmax
.
Visibility Check
After we know how CRUD modified these fields, let’s have a look how a transaction determine whether it can see a tuple or not.
Commit Log:
PostgreSQL use Commit Log (clog) to keep transaction statuses. Think clog as an array of state and the index it the txid. For example transaction 11 is running, so clog[11] == IN_PROGRESS
, and when it’s finished, it will set clog[11] = COMMITTED
or clog[11] = ABORTED
.
Snapshot:
In PostgreSQL, upon transaction begin, it gets a snapshot. Unless the isolation level is READ COMMITTED, the transaction will get a snapshot upon every SQL execution. The version number of a snapshot looks like xmin:xmax
. Every txid less than xmin is done, so called inactive, either committed or aborted, those committed are visible; Every txid greater than xmax is not assigned yet, considered as active and thus invisible. txid between xmin and xmax could be done or in progress, this is the small range you will need extra check.
Rules:
if status(t_xmin) == ABORTED: INVISIBLE;
RULE 1: It is obviously that tuple inserted by an aborted transaction is invisible.if status(t_xmin) == IN_PROGRESS:
if t_xmin != curr_txid: INVISIBLE;
RULE 2: This tuple is inserted by another tx and its in progress, thus invisible.if t_xmin == curr_txid && t_xmax != 0: INVISIBLE;
RULE 3: This tuple has been updated by this transaction, thus invisible.if t_xmin == curr_txid && t_xmax == 0: VISIBLE;
RULE 4: This tuple is the newest version inserted by this transaction itself, thus visible.
if status(t_xmin) == COMMITTED:
if snapshot_status(t_xmin) == ACTIVE: INVISIBLE;
RULE 5: Even if this tuple is inserted by a comitted transaction, but it’s considering active when taking the snapshot, thus invisible.elif t_xmax == INVALID || status(t_xmax) == ABORTED: VISIBLE
RULE 6: The tuple is inserted by a comitted transaction, and not been updated or the transaction that updated this tuple aborted, thus visible.elif status(t_xmax) == IN_PROGRESS:
if t_xmax == curr_txid: INVISIBLE;
RULE 7: This tuple have a newer version insterted by this transaction and thus invisible.else: VISIBLE;
RULE 8: Here t_xmin is committed and t_xmin is in progress, it is still visible.
elif status(t_xmax) == COMMITTED:
if snapshot_status(t_xmax) == ACTIVE: VISIBLE;
RULE 9: Similar to 3.1, t_xmax is considering active when taking the snapshot, and thus visible.else: INVISIBLE;
RULE 10: Both t_xmin and t_max are committed here, means this tuple is already updated or deleted by a committed transaction, and thus invisible.
Put these together will be something like:
bool is_tuple_visible(Tuple *tuple, TransactionId curr_txid, Snapshot *snapshot)
{
if (status(tuple->t_xmin) == ABORTED)
return false;
if (status(tuple->t_xmin) == IN_PROGRESS) {
if (tuple->t_xmin != curr_txid)
return false;
return (tuple->t_xmax == 0);
}
if (status(tuple->t_xmin) == COMMITTED) {
if (snapshot_status(tuple->t_xmin) == ACTIVE)
return false;
if (tuple->t_xmax == INVALID || status(tuple->t_xmax) == ABORTED)
return true;
if (status(tuple->t_xmax) == IN_PROGRESS)
return (tuple->t_xmax != curr_txid);
if (status(tuple->t_xmax) == COMMITTED)
return (snapshot_status(tuple->t_xmax) == ACTIVE);
}
return false; // Default case, should not reach here
}
These rules covered how visibility check applied in PostgreSQL in most cases, except for subtransactions. You might wonder how different transactions under different isolation level apply these rules. Well it’s quite simple:
- READ UNCOMMITTED: not implemented. Choosing this is same as choosing READ COMMITTED.
- READ COMMITTED: get a new snapshot on every command.
- REPEATABLE READ: only get a snapshot at the begining of the transaction.
- SERIALIZABLE: same as REPEATABLE READ, but apply additional rules to check if transaction violate constraints.
In next part, we will see how SSI works.