Transaction Engine
ParticleDB provides full ACID transactions using multi-version concurrency control
(MVCC) with snapshot isolation. The transaction engine supports three WAL
synchronization modes, concurrent insert append, two-phase commit, and row-level locking
with FOR UPDATE / FOR SHARE semantics.
MVCC and Snapshot Isolation
Section titled “MVCC and Snapshot Isolation”Every transaction operates on a consistent snapshot of the database taken at transaction start time. Readers never block writers, and writers never block readers.
BEGIN; -- snapshot taken at this pointSELECT * FROM orders; -- sees data as of snapshot time-- concurrent INSERT into orders by another transactionSELECT * FROM orders; -- still sees original snapshot (repeatable read)COMMIT;Each row version carries a transaction ID indicating which transaction created it. A transaction can only see row versions created by transactions that committed before its snapshot was taken. Uncommitted and later-committed versions are invisible.
Write Conflicts
Section titled “Write Conflicts”If two concurrent transactions attempt to modify the same row, the second writer detects the conflict and aborts. The application can retry the aborted transaction, which will acquire a new snapshot.
WAL Synchronization Modes
Section titled “WAL Synchronization Modes”The write-ahead log guarantees durability by persisting every mutation before it is applied in memory. Three synchronization modes let you trade durability for throughput:
particledb start --wal-sync-mode syncEvery WAL entry is followed by an fsync system call. This guarantees zero data loss
on crash at the cost of one fsync per write operation. Best for workloads where
every transaction must survive a power failure.
groupsync (default)
Section titled “groupsync (default)”particledb start --wal-sync-mode groupsyncWAL entries from concurrent transactions are batched together and flushed with a
single fsync. This amortizes the cost of fsync across multiple transactions,
providing a sub-millisecond durability window. In TPC-C benchmarks, groupsync reaches
175K TPS peak — up to 1.5x the throughput of PostgreSQL 17 at 8 workers.
nosync
Section titled “nosync”particledb start --wal-sync-mode nosyncThe WAL is disabled entirely. An AtomicBool flag gates all WAL codepaths — no
serialization, no buffer allocation, no fsync. Additional optimizations activate in
this mode:
- txn_table_locks skip — the global transaction table-lock mutex is bypassed via
an
AtomicBoolflag, eliminating ~640K mutex acquisitions/sec at 8 workers. - Thread-local WAL buffer disabled — no buffer allocation or flush.
This mode achieves 159K TPS at 8 workers but provides no crash recovery. Suitable for bulk loading, ephemeral analytics, or workloads with external durability guarantees (e.g., replay from an upstream event stream).
Concurrent INSERT Append
Section titled “Concurrent INSERT Append”INSERT operations use a read lock plus an append lock rather than an exclusive write lock on the table:
Traditional: td.write() -- exclusive lock, blocks all readersConcurrent: td.read() + append_lock -- read lock + Mutex on append positiontry_concurrent_push and try_concurrent_extend append new rows under the shared read
lock, with a Mutex serializing only the append position update. This means:
- INSERTs do not block concurrent SELECTs — readers hold the same read lock.
- INSERTs do not block concurrent UPDATEs — updates operate on existing rows.
- Falls back to an exclusive write lock only when the underlying storage needs capacity growth (rare, amortized).
Row counts are maintained with fetch_add (atomic increment) instead of
load + store, which is safe for concurrent appenders.
Transaction Modes
Section titled “Transaction Modes”ParticleDB supports three concurrency-control strategies, selected at server start with
the --txn-mode flag:
particledb start --txn-mode table-2pl # defaultparticledb start --txn-mode row-2plparticledb start --txn-mode fast| Mode | Locking Granularity | Isolation | Best For |
|---|---|---|---|
table-2pl | Table-level two-phase locking | Serializable | Simplest correctness model (default) |
row-2pl | Row-level via DashMap (PK-based locks) | Serializable | High-concurrency OLTP (TPC-C) |
fast | No locking | Read-committed | Maximum throughput, single-writer workloads |
Row-Level Locking with DashMap (row-2pl)
Section titled “Row-Level Locking with DashMap (row-2pl)”In row-2pl mode, row locks are tracked in a DashMap keyed by primary key value.
DashMap is a concurrent hash map with shard-level locking, providing lock-free read
access and fine-grained write contention — far less overhead than a global table mutex.
Transactions acquire row locks on first access and release them at commit or rollback. The DashMap approach means two transactions modifying different rows in the same table never contend with each other.
Non-Persistent Read Barriers
Section titled “Non-Persistent Read Barriers”Read transactions use a lightweight wait_for_writer barrier: if a reader’s snapshot overlaps with an in-flight writer, the reader waits for the writer to commit or abort rather than acquiring a read lock. This avoids the overhead of shared-lock bookkeeping while still preserving snapshot consistency.
Group Commit
Section titled “Group Commit”In groupsync WAL mode, ParticleDB uses a leader/follower group commit pattern:
- The first transaction to reach the WAL flush point becomes the leader.
- Subsequent transactions that arrive while the leader’s
fsyncis in flight register as followers and park on a condition variable. - When
fsynccompletes, the leader wakes all followers — their WAL entries were included in the same physical flush.
This batches multiple transaction commits into a single fsync syscall. In TPC-C
benchmarks, group commit provides a +76% throughput improvement over per-transaction
sync at 8 workers.
Transaction Buffered Operations
Section titled “Transaction Buffered Operations”All mutations within a transaction are accumulated in a TXN_BUFFERED_OPS buffer rather than being applied immediately:
- INSERTs, UPDATEs, and DELETEs are recorded as buffered operations.
- On COMMIT, the buffer is replayed in order against the storage engine.
- On ROLLBACK, the buffer is discarded with no storage side effects.
This design ensures that partial transactions never leave visible state and simplifies conflict detection — the engine checks the buffer for write-write conflicts at commit time.
Engine-Visible Transaction Overlay
Section titled “Engine-Visible Transaction Overlay”The TXN_OVERLAY provides a read-your-writes view within an open transaction.
When a query runs inside a transaction, the engine merges the overlay (buffered but
uncommitted writes) with the base snapshot so that statements like
INSERT ... ON CONFLICT DO UPDATE can see rows inserted earlier in the same transaction.
Row-Level Locking
Section titled “Row-Level Locking”ParticleDB supports SQL-standard row-level locking within transactions:
-- Exclusive lock: block other writers and lockersSELECT * FROM orders WHERE id = 42 FOR UPDATE;
-- Shared lock: allow other readers, block writersSELECT * FROM orders WHERE id = 42 FOR SHARE;
-- Skip rows locked by other transactionsSELECT * FROM orders WHERE status = 'pending' FOR UPDATE SKIP LOCKED;
-- Fail immediately if the row is lockedSELECT * FROM orders WHERE id = 42 FOR UPDATE NOWAIT;| Clause | Behavior |
|---|---|
FOR UPDATE | Acquires an exclusive row lock; blocks other FOR UPDATE/FOR SHARE |
FOR SHARE | Acquires a shared row lock; allows other FOR SHARE, blocks FOR UPDATE |
SKIP LOCKED | Silently skips rows held by other transactions |
NOWAIT | Returns an error immediately if the row is already locked |
Row locks are released when the transaction commits or rolls back.
Two-Phase Commit
Section titled “Two-Phase Commit”ParticleDB supports the two-phase commit (2PC) protocol for distributed transactions that span multiple systems:
-- Phase 1: PreparePREPARE TRANSACTION 'txn_abc123';
-- Phase 2: Commit (or Rollback)COMMIT PREPARED 'txn_abc123';-- orROLLBACK PREPARED 'txn_abc123';Prepared transactions survive server restarts — they are persisted in the WAL and recovered on startup. A monitoring process or coordinator can then issue the final COMMIT or ROLLBACK.
Transaction Lifecycle
Section titled “Transaction Lifecycle”BEGIN │ ├── Snapshot acquired (read timestamp) │ ├── Read operations (see snapshot) ├── Write operations (buffered + WAL) │ ├── Conflict detection on write │ └── Row locks acquired (FOR UPDATE/FOR SHARE) │ ├── COMMIT │ ├── WAL flush (per mode: sync/groupsync/nosync) │ ├── Writes applied to storage │ ├── Row locks released │ └── Cache invalidation (batch_cache, zone_maps, etc.) │ └── ROLLBACK ├── Buffered writes discarded └── Row locks releasedPerformance Optimizations
Section titled “Performance Optimizations”Key Optimizations Summary
Section titled “Key Optimizations Summary”| Optimization | Impact |
|---|---|
| WAL disabled (nosync) | 2.6x throughput (40K to 106K TPS) |
| Concurrent INSERT append | INSERTs no longer block SELECTs/UPDATEs |
| txn_table_locks skip | Eliminates ~640K mutex acquisitions/sec |
| Thread-local WAL buffer | Eliminates WAL mutex contention |
| AHashMap for table lookups | 2-4x faster hash lookups for short string keys |
| has_prefix_indexes flag | Skip prefix index lock reads when no indexes exist |
| PK index single lookup | One get() instead of contains_key() + get() |
| Batch insert prefix index | Single lock per batch instead of per-row |
| Batch undo string allocation | Single to_string() per batch instead of per-row |
| Atomic row count increment | fetch_add safe for concurrent appenders |
TPC-C Benchmark Numbers
Section titled “TPC-C Benchmark Numbers”| Configuration | TPS (8 workers) |
|---|---|
sync + row-2pl | 10,069 |
groupsync + row-2pl | 83,520 |
groupsync + table-2pl | 132,597 |
nosync + table-2pl | 159,016 |
Jepsen Testing Status
Section titled “Jepsen Testing Status”ParticleDB is being validated with Jepsen-style append tests:
- 65% of append tests pass under the current isolation model.
- 0 info violations — no ambiguous or lost-update anomalies detected.
Work is ongoing to close the remaining 35% (primarily related to serializable ordering edge cases under high contention).
Remaining Bottlenecks
Section titled “Remaining Bottlenecks”- Per-row
Stringallocation for column names (~111 clones per transaction). - WAL serialization cost (~3.5 us per transaction in sync/groupsync modes).
- PK index write-lock contention under heavy concurrent INSERT + UPDATE workloads.