Serializable Snapshot Isolation - Part 1

󰃭 2024-12-10

Snapshot Isolation

Let’s recap how Snapshot Isolations works first.

The concept of SI is quite simple: A transaction only see data that has been committed before it starts. Even if a newer version data has been committed(updated) by another transaction, it can not see this update. Unless, the update is done by this transaction itself. Upon commit, it will abort if any conflict detected.

Internal Tuple Fields

Everytime a transaction begins, an unique uint_32 monotonic increasing* transaction id (txid) will assigned to it. Tuples (table rows) have some extra fields (columns) that is related to txid and normally not visiable by user, Here’s the fields that PostgreSQL used for SI:

  • t_min: the transaction ID (txid) of the transaction that inserted this tuple.
  • t_max: the transaction that deleted or updated this tuple. 0 mean this tuple has not been deleted or updated yet.
  • t_cid: is the number of command executed within current transaction, start from 0.
  • t_ctid: current tuple identifier, a pointer points to itself or its update.

Next we take a look how t_min, t_max, t_cid, and t_ctid might change during typical CRUD operations in PostgreSQL.


Initial Setup

Let’s assume we start with a simple table:

CREATE TABLE example (
    id SERIAL PRIMARY KEY,
    value INTEGER
);

Initial state (no rows):

ctid value xmin xmax t_ctid Notes
(none) (none) (none) (none) (none) Table is empty.

Step 1: Insert (Create)

INSERT INTO example (value) VALUES (10);
  • Action: A new row is inserted.
  • Effect: xmin is set to the transaction ID of the insert, xmax is NULL (indicating the row is valid), and t_ctid is NULL (no new version).
ctid value xmin xmax t_ctid Notes
(0,1) 10 100 NULL NULL Row inserted by xid=100.

Step 2: Read

SELECT * FROM example WHERE value = 10;
  • Action: The row is read.
  • Effect: No change to the internal metadata (xmin, xmax, t_ctid) because reads do not modify the row. Visibility check will be discussed later.
ctid value xmin xmax t_ctid Notes
(0,1) 10 100 NULL NULL Row read. No changes.

Step 3: Update

UPDATE example SET value = 20 WHERE value = 10;
  • Action: The row is updated. PostgreSQL creates a new row version for the update, links the old row to the new row via t_ctid, and sets xmax for the old row to the current transaction ID. The new row’s xmin is the current transaction ID, and t_ctid is NULL for the new row.
ctid value xmin xmax t_ctid Notes
(0,1) 10 100 200 (0,2) Old version marked deleted (xid=200).
(0,2) 20 200 NULL NULL New version inserted (xid=200).

Step 4: Delete

DELETE FROM example WHERE value = 20;
  • Action: The row is deleted. PostgreSQL marks the row as invalid by setting its xmax to the current transaction ID (300). No new version is created, so t_ctid remains NULL.
ctid value xmin xmax t_ctid Notes
(0,1) 10 100 200 (0,2) Old version remains unchanged.
(0,2) 20 200 300 NULL Row marked deleted (xid=300).

Behavior Recap

  • Insert: Creates a new row with xmin set and xmax/t_ctid as NULL.
  • Read: Does not modify xmin, xmax, or t_ctid.
  • Update: Marks the old version with xmax and creates a new version linked via t_ctid.
  • Delete: Marks the row as invalid by setting xmax.

Visibility Check

After we know how CRUD modified these fields, let’s have a look how a transaction determine whether it can see a tuple or not.

Commit Log:

PostgreSQL use Commit Log (clog) to keep transaction statuses. Think clog as an array of state and the index it the txid. For example transaction 11 is running, so clog[11] == IN_PROGRESS, and when it’s finished, it will set clog[11] = COMMITTED or clog[11] = ABORTED.

Snapshot:

In PostgreSQL, upon transaction begin, it gets a snapshot. Unless the isolation level is READ COMMITTED, the transaction will get a snapshot upon every SQL execution. The version number of a snapshot looks like xmin:xmax. Every txid less than xmin is done, so called inactive, either committed or aborted, those committed are visible; Every txid greater than xmax is not assigned yet, considered as active and thus invisible. txid between xmin and xmax could be done or in progress, this is the small range you will need extra check.

Rules:

  1. if status(t_xmin) == ABORTED: INVISIBLE; RULE 1: It is obviously that tuple inserted by an aborted transaction is invisible.
  2. if status(t_xmin) == IN_PROGRESS:
    1. if t_xmin != curr_txid: INVISIBLE; RULE 2: This tuple is inserted by another tx and its in progress, thus invisible.
    2. if t_xmin == curr_txid && t_xmax != 0: INVISIBLE; RULE 3: This tuple has been updated by this transaction, thus invisible.
    3. if t_xmin == curr_txid && t_xmax == 0: VISIBLE; RULE 4: This tuple is the newest version inserted by this transaction itself, thus visible.
  3. if status(t_xmin) == COMMITTED:
    1. if snapshot_status(t_xmin) == ACTIVE: INVISIBLE; RULE 5: Even if this tuple is inserted by a comitted transaction, but it’s considering active when taking the snapshot, thus invisible.
    2. elif t_xmax == INVALID || status(t_xmax) == ABORTED: VISIBLE RULE 6: The tuple is inserted by a comitted transaction, and not been updated or the transaction that updated this tuple aborted, thus visible.
    3. elif status(t_xmax) == IN_PROGRESS:
      1. if t_xmax == curr_txid: INVISIBLE; RULE 7: This tuple have a newer version insterted by this transaction and thus invisible.
      2. else: VISIBLE; RULE 8: Here t_xmin is committed and t_xmin is in progress, it is still visible.
    4. elif status(t_xmax) == COMMITTED:
      1. if snapshot_status(t_xmax) == ACTIVE: VISIBLE; RULE 9: Similar to 3.1, t_xmax is considering active when taking the snapshot, and thus visible.
      2. else: INVISIBLE; RULE 10: Both t_xmin and t_max are committed here, means this tuple is already updated or deleted by a committed transaction, and thus invisible.

Put these together will be something like:

bool is_tuple_visible(Tuple *tuple, TransactionId curr_txid, Snapshot *snapshot)
{
    if (status(tuple->t_xmin) == ABORTED)
        return false;

    if (status(tuple->t_xmin) == IN_PROGRESS) {
        if (tuple->t_xmin != curr_txid)
            return false;
        return (tuple->t_xmax == 0);
    }

    if (status(tuple->t_xmin) == COMMITTED) {
        if (snapshot_status(tuple->t_xmin) == ACTIVE)
            return false;

        if (tuple->t_xmax == INVALID || status(tuple->t_xmax) == ABORTED)
            return true;

        if (status(tuple->t_xmax) == IN_PROGRESS)
            return (tuple->t_xmax != curr_txid);

        if (status(tuple->t_xmax) == COMMITTED)
            return (snapshot_status(tuple->t_xmax) == ACTIVE);
    }

    return false;  // Default case, should not reach here
}

These rules covered how visibility check applied in PostgreSQL in most cases, except for subtransactions. You might wonder how different transactions under different isolation level apply these rules. Well it’s quite simple:

  1. READ UNCOMMITTED: not implemented. Choosing this is same as choosing READ COMMITTED.
  2. READ COMMITTED: get a new snapshot on every command.
  3. REPEATABLE READ: only get a snapshot at the begining of the transaction.
  4. SERIALIZABLE: same as REPEATABLE READ, but apply additional rules to check if transaction violate constraints.

In next part, we will see how SSI works.