Skip to content

Architecture Overview

ParticleDB is a hybrid transactional/analytical processing (HTAP) database written entirely in Rust. A single process serves both OLTP point lookups and OLAP analytical queries over the same data, eliminating the need for separate ETL pipelines between operational and analytical systems.

┌──────────────────────────────────────────────────────────────────┐
│ Client Connections │
│ PostgreSQL Wire │ gRPC │ HTTP/REST │ Redis RESP │ WS │
└────────┬───────────┴────┬───┴──────┬──────┴───────┬──────┴──┬───┘
│ │ │ │ │
┌────────▼────────────────▼──────────▼──────────────▼─────────▼───┐
│ Network Layer │
│ Protocol handlers, TLS, auth, routing │
└────────────────────────────┬────────────────────────────────────┘
┌────────────────────────────▼────────────────────────────────────┐
│ SQL Parser & Planner │
│ Parse → AST → Logical Plan → Optimize → Physical Plan │
│ Predicate pushdown · Projection pushdown · Constant folding │
│ Join reordering · Plan cache │
└────────────────────────────┬────────────────────────────────────┘
┌────────────────────────────▼────────────────────────────────────┐
│ Query Execution Engine │
│ Vectorized operators on Apache Arrow columnar arrays │
│ SIMD aggregation · Zone-level precomputed aggregates │
│ Parallel execution via Rayon thread pool │
│ Fused filter+aggregate · Dense-array GROUP BY │
└─────────┬──────────────────────────────────┬────────────────────┘
│ │
┌─────────▼──────────┐ ┌────────────▼────────────────────┐
│ Transaction Engine │ │ Vector Index │
│ MVCC + Snapshot │ │ HNSW · IVFFlat │
│ Isolation │ │ L2 · Cosine · Inner Product │
│ WAL · 2PC │ └────────────────────────────────┘
│ Row-level locking │
└─────────┬──────────┘
┌─────────▼──────────────────────────────────────────────────────┐
│ Storage Engine │
│ LSM-tree · Write-Ahead Log · Batch cache │
│ Zone maps (precomputed sum/count/min/max per chunk) │
│ Dictionary encoding · Flat column cache │
│ Compression: none / LZ4 / Zstd │
└─────────────────────────────────────────────────────────────────┘

The SQL layer parses incoming queries into an abstract syntax tree and transforms them through a series of optimization passes before producing a physical execution plan.

OptimizationDescription
Predicate pushdownPushes WHERE filters through projections, joins, and into scans to reduce rows early
Projection pushdownNarrows scan projections to only the columns referenced by the query
Constant foldingEvaluates constant sub-expressions at plan time (e.g., AND TRUE simplified away)
Join reorderingGreedy heuristic: flattens multi-way inner joins, sorts by estimated cardinality, rebuilds left-deep tree
Plan cacheNormalized SQL text maps to cached physical plans; cleared on DDL changes

The engine executes physical plans using vectorized operators over Apache Arrow columnar arrays. Chunks of 8,192 rows keep each column within L1 cache (64 KB). The Rayon thread pool enables parallel execution across chunks and partitions.

Key execution strategies:

  • Zone-level precomputed aggregation — resolve SUM / COUNT / MIN / MAX from pre-built zone statistics in O(chunks) instead of O(rows).
  • Dense-array GROUP BY — O(1) per row for integer group keys; no hash table overhead.
  • SIMD mask-based accumulation — filtered aggregates use bit-scanning over boolean masks with trailing-zeros iteration.
  • Fused filter+aggregate — filter and aggregation run in a single pass with no intermediate batch materialization.
  • Streaming hash joins — build on the smaller side, probe in parallel across batches.

See the Query Engine deep dive for full details.

An LSM-tree forms the persistent layer, with a write-ahead log (WAL) for crash recovery. On top of the LSM sits a batch cache that holds recently ingested data as Arrow RecordBatch arrays, and a flat column cache that concatenates all batches for a column into a single contiguous Vec for hardware-prefetch-friendly sequential scans.

Zone maps track per-chunk min, max, sum, and count, enabling the query engine to skip entire chunks or resolve aggregates without touching rows. Low-cardinality string columns use dictionary encoding with direct-index aggregation.

See the Storage Engine deep dive for full details.

ParticleDB provides MVCC with snapshot isolation. Three WAL synchronization modes trade durability for throughput:

ModeBehaviorUse case
syncPer-entry fsyncMaximum durability
groupsyncBatched fsync across transactionsBalanced (default)
nosyncWAL writes skipped entirelyMaximum throughput

Concurrent INSERT append uses a read lock plus an append lock, allowing inserts to proceed without blocking concurrent reads or updates. Row-level locking supports FOR UPDATE / FOR SHARE with SKIP LOCKED and NOWAIT.

See the Transaction Engine deep dive for full details.

ParticleDB exposes five wire protocols through a single process, so applications can connect with whichever protocol fits their stack:

ProtocolDefault PortPrimary Use
PostgreSQL wire5432SQL access, ORM compatibility
gRPC50051Typed RPC, streaming, SDK backbone
HTTP / REST8080Lightweight queries, admin API
Redis RESP6379Key-value and data-structure access
WebSocket8080Real-time subscriptions, browser apps

All protocols share the same underlying storage and transaction engine, so data written via one protocol is immediately visible through any other.

ParticleDB includes built-in vector similarity search without external plugins:

  • HNSW (Hierarchical Navigable Small World) for high-recall approximate nearest neighbor search.
  • IVFFlat (Inverted File with Flat quantization) for large-scale workloads.
  • Distance metrics: L2 (Euclidean), Cosine, and Inner Product.

Vector columns are defined with standard SQL DDL and queried with the <-> operator or the vector_search() function. See Vector Search for SQL syntax.

  1. Single binary — one Rust binary contains the full database. No JVM, no external dependencies, no sidecar processes.
  2. Columnar-first — Arrow columnar format from storage through execution means analytical queries scan only the columns they need.
  3. Cache-conscious — chunk sizes, flat column caches, and dense-array accumulators are tuned to fit CPU L1/L2 caches.
  4. Protocol diversity — five protocols let you use the right tool for each workload without proxies or adapters.
  5. HTAP by design — OLTP and OLAP share the same storage and transaction engine rather than replicating data between separate systems.