Serializable Snapshot Isolation - Part 1

󰃭 2024-12-10

title: ‘Serializable Snapshot Isolation - Part 1’ date: 2024-12-10T05:33:08Z tags: - Database

summary: description: How Serializable Snapshot Isolation works and is implemented in PostgreSQL.

Snapshot Isolation

Let us recap how Snapshot Isolations works first.

The concept of SI is quite simple: A transaction only sees data that has been committed before it starts. Even if a newer version of data has been committed(updated) by another transaction, it can not see this update unless the update is done by this transaction itself. Upon commit, it will abort if any conflict is detected.

Internal Tuple Fields

Every time a transaction begins, a unique uint_32 monotonic increasing* transaction ID (txid) is assigned to it. Tuples (table rows) have some extra fields (columns) that are related to txid and normally not visible to the user. Here are the fields that PostgreSQL used for SI:

t_min: the transaction ID (txid) of the transaction that inserted this tuple.
t_max: the transaction that deleted or updated this tuple. 0 means this tuple has not been deleted or updated yet.
t_cid: is the number of the commands executed within the current transaction, starting from 0.
t_ctid: current tuple identifier, a pointer points to itself or its update.

Next, we examine how t_min, t_max, t_cid, and t_ctid might change during typical CRUD operations in PostgreSQL.

Initial Setup

Let’s assume we start with a simple table:

CREATE TABLE example (
    id SERIAL PRIMARY KEY,
    value INTEGER
);

Initial state (no rows):

`ctid`	`value`	`xmin`	`xmax`	`t_ctid`	Notes
(none)	(none)	(none)	(none)	(none)	Table is empty.

Step 1: Insert (Create)

INSERT INTO example (value) VALUES (10);

Action: A new row is inserted.
Effect: xmin is set to the transaction ID of the insert, xmax is NULL (indicating the row is valid), and t_ctid is NULL (no new version).

`ctid`	`value`	`xmin`	`xmax`	`t_ctid`	Notes
`(0,1)`	`10`	`100`	`NULL`	`NULL`	Row inserted by `xid=100`.

Step 2: Read

SELECT * FROM example WHERE value = 10;

Action: The row is read.
Effect: No change to the internal metadata (xmin, xmax, t_ctid) because reads do not modify the row. The visibility check will be discussed later.

`ctid`	`value`	`xmin`	`xmax`	`t_ctid`	Notes
`(0,1)`	`10`	`100`	`NULL`	`NULL`	Row read. No changes.

Step 3: Update

UPDATE example SET value = 20 WHERE value = 10;

Action: The row is updated. PostgreSQL creates a new row version for the update, links the old row to the new row via t_ctid, and sets xmax for the old row to the current transaction ID. The new row’s xmin is the current transaction ID, and t_ctid is NULL for the new row.

`ctid`	`value`	`xmin`	`xmax`	`t_ctid`	Notes
`(0,1)`	`10`	`100`	`200`	`(0,2)`	Old version marked deleted (`xid=200`).
`(0,2)`	`20`	`200`	`NULL`	`NULL`	New version inserted (`xid=200`).

Step 4: Delete

DELETE FROM example WHERE value = 20;

Action: The row is deleted. PostgreSQL marks the row as invalid by setting its xmax to the current transaction ID (300). No new version is created, so t_ctid remains NULL.

`ctid`	`value`	`xmin`	`xmax`	`t_ctid`	Notes
`(0,1)`	`10`	`100`	`200`	`(0,2)`	Old version remains unchanged.
`(0,2)`	`20`	`200`	`300`	`NULL`	Row marked deleted (`xid=300`).

Behavior Recap

Insert: Creates a new row with xmin set xmax and t_ctid as NULL.
Read: Does not modify xmin, xmax, or t_ctid.
Update: Marks the old version with xmax and creates a new version linked via t_ctid.
Delete: Marks the row as invalid by setting xmax.

Visibility Check

After we know how CRUD modified these fields, let’s examine how a transaction determines whether it can see a tuple.

Commit Log:

PostgreSQL uses Commit Log (clog) to keep transaction statuses. Think of clog as an array of states, and the index is the txid. For example, transaction 11 is running, so clog[11] == IN_PROGRESS, and when it’s finished, it will set clog[11] = COMMITTED or clog[11] = ABORTED.

Snapshot:

In PostgreSQL, upon transaction begin, it gets a snapshot. Unless the isolation level is READ COMMITTED, the transaction will get a snapshot upon every SQL execution. The version number of a snapshot looks like xmin:xmax. Every txid less than xmin is done, so-called inactive, either committed or aborted; those committed are visible. Every txid greater than xmax is not assigned yet, considered active and thus invisible. txid between xmin and xmax could be done or in progress, this is the small range you will need extra check.

Rules:

if status(t_xmin) == ABORTED: INVISIBLE; RULE 1: It is obvious that the tuple inserted by an aborted transaction is invisible.
if status(t_xmin) == IN_PROGRESS:
1. if t_xmin != curr_txid: INVISIBLE; RULE 2: This tuple is inserted by another tx and is in progress, thus invisible.
2. if t_xmin == curr_txid && t_xmax != 0: INVISIBLE; RULE 3: This tuple has been updated by this transaction, thus invisible.
3. if t_xmin == curr_txid && t_xmax == 0: VISIBLE; RULE 4: This tuple is the newest version inserted by this transaction itself, thus visible.
if status(t_xmin) == COMMITTED:
1. if snapshot_status(t_xmin) == ACTIVE: INVISIBLE; RULE 5: Even if this tuple is inserted by a committed transaction, it’s considered active when taking the snapshot, thus invisible.
2. elif t_xmax == INVALID || status(t_xmax) == ABORTED: VISIBLE RULE 6: The tuple is inserted by a committed transaction, and not been updated or the transaction that updated this tuple aborted, thus visible.
3. elif status(t_xmax) == IN_PROGRESS:
  1. if t_xmax == curr_txid: INVISIBLE; RULE 7: This tuple has a newer version insterted by this transaction and thus invisible.
  2. else: VISIBLE; RULE 8: Here t_xmin is committed and t_xmin is in progress, it is still visible.
4. elif status(t_xmax) == COMMITTED:
  1. if snapshot_status(t_xmax) == ACTIVE: VISIBLE; RULE 9: Similar to 3.1, t_xmax is considered active when taking the snapshot, and thus visible.
  2. else: INVISIBLE; RULE 10: Both t_xmin and t_max are committed here, means this tuple is already updated or deleted by a committed transaction, and thus invisible.

Put these together will be something like:

bool is_tuple_visible(Tuple *tuple, TransactionId curr_txid, Snapshot *snapshot)
{
    if (status(tuple->t_xmin) == ABORTED)
        return false;

    if (status(tuple->t_xmin) == IN_PROGRESS) {
        if (tuple->t_xmin != curr_txid)
            return false;
        return (tuple->t_xmax == 0);
    }

    if (status(tuple->t_xmin) == COMMITTED) {
        if (snapshot_status(tuple->t_xmin) == ACTIVE)
            return false;

        if (tuple->t_xmax == INVALID || status(tuple->t_xmax) == ABORTED)
            return true;

        if (status(tuple->t_xmax) == IN_PROGRESS)
            return (tuple->t_xmax != curr_txid);

        if (status(tuple->t_xmax) == COMMITTED)
            return (snapshot_status(tuple->t_xmax) == ACTIVE);
    }

    return false;  // Default case, should not reach here
}

These rules covered how visibility checks are applied in PostgreSQL in most cases, except for subtransactions. You might wonder how different transactions under different isolation levels apply these rules. Well, it is quite simple:

READ UNCOMMITTED: Not implemented. Choosing this is the same as choosing READ COMMITTED.
READ COMMITTED: Get a new snapshot on every command.
REPEATABLE READ: Only get a snapshot at the beginning of the transaction.
SERIALIZABLE: Same as REPEATABLE READ, but apply additional rules to check if the transaction violates constraints.

In the next part, we will see how SSI works.