OLAP Comparison
Netflix Dataset (100M rows)
Section titled “Netflix Dataset (100M rows)”| Query | ParticleDB | DuckDB | PostgreSQL | PDB vs DuckDB |
|---|---|---|---|---|
| COUNT(*) | 0.04ms | <1ms | 1,075ms | Metadata |
| SUM(rating) | 0.08ms | 7ms | 1,379ms | 90x |
| AVG GROUP BY movie (17K groups) | 31.9ms | 68ms | 2,665ms | 2.1x |
| COUNT GROUP BY rating (5 groups) | 5.2ms | 19ms | 2,221ms | 3.7x |
| GROUP BY 500K groups | 11.5ms | 12ms | — | Parity |
ParticleDB wins 5 of 7 queries vs DuckDB.
Yandex Dataset (100M rows)
Section titled “Yandex Dataset (100M rows)”| Query | ParticleDB | DuckDB | PostgreSQL | PDB vs DuckDB |
|---|---|---|---|---|
| COUNT(*) | 0.04ms | <1ms | 1,157ms | Metadata |
| SUM(dwell_time) | 0.06ms | 10ms | 1,651ms | 170x |
| GROUP BY session_id (5M groups) | 66ms | 89ms | 11,137ms | 1.3x |
| GROUP BY action (4 groups) | 20ms | 24ms | 3,045ms | 1.2x |
| Filter + aggregate | 8.8ms | 9ms | — | Parity |
ParticleDB wins 7 of 8 queries vs DuckDB.
Airline Dataset (100M rows)
Section titled “Airline Dataset (100M rows)”| Query | ParticleDB | DuckDB | PDB vs DuckDB |
|---|---|---|---|
| SUM(distance) | 0.06ms | 10ms | 156x |
| AVG GROUP BY carrier (20 groups) | 27ms | 33ms | 1.2x |
| Filter + 3 aggregates | 10ms | 23ms | 2.3x |
| GROUP BY 500K groups | 71ms | 96ms | 1.4x |
Where DuckDB Wins
Section titled “Where DuckDB Wins”DuckDB is faster on some query patterns:
- Multi-column GROUP BY with millions of groups: DuckDB’s morsel-driven vectorized execution handles very high cardinality GROUP BY ~2-3x faster
- Complex JOIN + aggregate: DuckDB’s compiled pipeline avoids per-row function dispatch overhead
- String-heavy operations: DuckDB’s string dictionary encoding is more tightly integrated
We’re actively working on vectorized hash aggregation to close these gaps.
Why ParticleDB Excels at Analytics
Section titled “Why ParticleDB Excels at Analytics”| Technique | Benefit |
|---|---|
| Zone map precomputation | Ungrouped SUM/AVG/MIN/MAX in O(1) |
| Contiguous flat columns | Hardware prefetcher-friendly scans |
| DictU32 string encoding | 4-byte keys for string GROUP BY |
| SIMD mask accumulation | Branchless low-cardinality GROUP BY |
| Radix-partitioned hash tables | L2 cache-friendly for high cardinality |
| Tree-reduction merge | O(log N) parallel merge instead of O(N) |