Joins
Joins combine rows from two or more tables based on a related column. ParticleDB uses a streaming hash join engine with parallel probe for large inputs and dedicated operators for SEMI/ANTI/ASOF joins.
JOIN Types at a Glance
Section titled “JOIN Types at a Glance”| Join Type | Returns |
|---|---|
INNER JOIN | Only rows that match in both tables |
LEFT JOIN | All rows from the left table, plus matches from the right (NULLs for non-matches) |
RIGHT JOIN | All rows from the right table, plus matches from the left (NULLs for non-matches) |
FULL OUTER JOIN | All rows from both tables (NULLs where no match exists on either side) |
CROSS JOIN | Every combination of rows from both tables (Cartesian product) |
NATURAL JOIN | Automatic INNER JOIN on all columns with the same name |
LEFT SEMI JOIN | Rows from the left table where at least one match exists on the right (left columns only). Equivalent to WHERE EXISTS(...) |
LEFT ANTI JOIN | Rows from the left table where no match exists on the right (left columns only). Equivalent to WHERE NOT EXISTS(...) |
ASOF JOIN | Time-series join: pairs each left row with the most recent prior right row (per equi-key) |
The examples below use these three tables:
CREATE TABLE customers ( id BIGINT PRIMARY KEY, name TEXT NOT NULL, tier TEXT DEFAULT 'standard');
CREATE TABLE orders ( id BIGINT PRIMARY KEY, customer_id BIGINT NOT NULL, total FLOAT64 NOT NULL, status TEXT DEFAULT 'pending');
CREATE TABLE order_items ( id BIGINT PRIMARY KEY, order_id BIGINT NOT NULL, product TEXT NOT NULL, quantity INT64 NOT NULL, unit_price FLOAT64 NOT NULL);
INSERT INTO customers VALUES (1, 'Alice', 'premium'), (2, 'Bob', 'standard'), (3, 'Carol', 'premium'), (4, 'Dave', 'standard');
INSERT INTO orders VALUES (100, 1, 250.00, 'shipped'), (101, 1, 75.00, 'delivered'), (102, 2, 150.00, 'pending'), (103, 5, 300.00, 'shipped'); -- customer_id 5 does not exist
INSERT INTO order_items VALUES (1, 100, 'Keyboard', 1, 89.99), (2, 100, 'Mouse', 2, 29.99), (3, 101, 'USB Cable', 3, 9.99), (4, 102, 'Monitor', 1, 149.99);INNER JOIN
Section titled “INNER JOIN”Returns only rows where the join condition matches in both tables. Rows without a match on either side are excluded.
SELECT c.name, o.id AS order_id, o.totalFROM customers cINNER JOIN orders o ON c.id = o.customer_id;| name | order_id | total |
|---|---|---|
| Alice | 100 | 250.00 |
| Alice | 101 | 75.00 |
| Bob | 102 | 150.00 |
Carol (no orders) and order 103 (customer_id 5 doesn’t exist) are both excluded.
JOIN without a qualifier defaults to INNER JOIN:
SELECT c.name, o.totalFROM customers cJOIN orders o ON c.id = o.customer_id;LEFT JOIN
Section titled “LEFT JOIN”Returns all rows from the left table. Where no match exists in the right table, right-side columns are filled with NULL.
SELECT c.name, o.id AS order_id, o.totalFROM customers cLEFT JOIN orders o ON c.id = o.customer_id;| name | order_id | total |
|---|---|---|
| Alice | 100 | 250.00 |
| Alice | 101 | 75.00 |
| Bob | 102 | 150.00 |
| Carol | NULL | NULL |
| Dave | NULL | NULL |
Use LEFT JOIN to find rows without matches:
-- Customers who have never placed an orderSELECT c.nameFROM customers cLEFT JOIN orders o ON c.id = o.customer_idWHERE o.id IS NULL;| name |
|---|
| Carol |
| Dave |
RIGHT JOIN
Section titled “RIGHT JOIN”The mirror of LEFT JOIN: all rows from the right table are kept, and left-side columns are NULL for non-matches.
SELECT c.name, o.id AS order_id, o.totalFROM customers cRIGHT JOIN orders o ON c.id = o.customer_id;| name | order_id | total |
|---|---|---|
| Alice | 100 | 250.00 |
| Alice | 101 | 75.00 |
| Bob | 102 | 150.00 |
| NULL | 103 | 300.00 |
Order 103 (customer_id 5) has no matching customer, so name is NULL.
FULL OUTER JOIN
Section titled “FULL OUTER JOIN”Returns all rows from both tables, with NULLs on the side where there is no match.
SELECT c.name, o.id AS order_id, o.totalFROM customers cFULL OUTER JOIN orders o ON c.id = o.customer_id;| name | order_id | total |
|---|---|---|
| Alice | 100 | 250.00 |
| Alice | 101 | 75.00 |
| Bob | 102 | 150.00 |
| Carol | NULL | NULL |
| Dave | NULL | NULL |
| NULL | 103 | 300.00 |
CROSS JOIN
Section titled “CROSS JOIN”Produces the Cartesian product — every row from the left table paired with every row from the right. No join condition is specified.
SELECT c.name, t.labelFROM customers cCROSS JOIN (VALUES ('Q1'), ('Q2'), ('Q3'), ('Q4')) AS t(label);This returns 4 customers x 4 quarters = 16 rows.
An equivalent syntax uses comma-separated FROM:
SELECT c.name, t.labelFROM customers c, (VALUES ('Q1'), ('Q2'), ('Q3'), ('Q4')) AS t(label);NATURAL JOIN
Section titled “NATURAL JOIN”Automatically joins on all columns with matching names in both tables. Eliminates duplicate columns from the result.
-- If both tables have an "id" column, NATURAL JOIN matches on id-- This is rarely used in practice -- explicit ON/USING is preferredSELECT * FROM employees NATURAL JOIN departments;Caution: NATURAL JOIN can produce unexpected results if tables share column names that shouldn’t be join keys. Prefer explicit ON or USING clauses.
USING Clause
Section titled “USING Clause”When the join column has the same name in both tables, USING is a cleaner alternative to ON:
-- These two are equivalent:SELECT * FROM orders JOIN order_items ON orders.id = order_items.order_id;
-- Using USING when the column name matches-- (this requires the column to have the same name in both tables)CREATE TABLE order_items_v2 ( id BIGINT PRIMARY KEY, id BIGINT NOT NULL, -- same name as orders.id product TEXT);
SELECT * FROM orders JOIN order_items_v2 USING (id);In practice, USING is most useful when foreign key columns share names:
SELECT c.name, o.totalFROM customers cJOIN orders o USING (customer_id);-- Equivalent to: ON c.customer_id = o.customer_idMulti-Table Joins
Section titled “Multi-Table Joins”Chain joins to combine three or more tables:
-- Full order details: customer + order + itemsSELECT c.name AS customer, o.id AS order_id, o.status, oi.product, oi.quantity, oi.unit_price, oi.quantity * oi.unit_price AS line_totalFROM customers cJOIN orders o ON c.id = o.customer_idJOIN order_items oi ON o.id = oi.order_idORDER BY c.name, o.id, oi.product;| customer | order_id | status | product | quantity | unit_price | line_total |
|---|---|---|---|---|---|---|
| Alice | 100 | shipped | Keyboard | 1 | 89.99 | 89.99 |
| Alice | 100 | shipped | Mouse | 2 | 29.99 | 59.98 |
| Alice | 101 | delivered | USB Cable | 3 | 9.99 | 29.97 |
| Bob | 102 | pending | Monitor | 1 | 149.99 | 149.99 |
Mixing Join Types
Section titled “Mixing Join Types”-- All customers, their orders (if any), and order items (if any)SELECT c.name, o.id AS order_id, oi.productFROM customers cLEFT JOIN orders o ON c.id = o.customer_idLEFT JOIN order_items oi ON o.id = oi.order_idORDER BY c.name;Self Join
Section titled “Self Join”Join a table to itself to compare rows within the same table:
-- Find employees who earn more than their managerSELECT e.name AS employee, e.salary AS employee_salary, m.name AS manager, m.salary AS manager_salaryFROM employees eJOIN employees m ON e.manager_id = m.idWHERE e.salary > m.salary;Join with Aggregation
Section titled “Join with Aggregation”Combine joins with GROUP BY to compute per-group metrics across tables:
-- Revenue per customerSELECT c.name, c.tier, COUNT(o.id) AS order_count, COALESCE(SUM(o.total), 0) AS total_revenueFROM customers cLEFT JOIN orders o ON c.id = o.customer_idGROUP BY c.name, c.tierORDER BY total_revenue DESC;| name | tier | order_count | total_revenue |
|---|---|---|---|
| Alice | premium | 2 | 325.00 |
| Bob | standard | 1 | 150.00 |
| Carol | premium | 0 | 0 |
| Dave | standard | 0 | 0 |
LEFT SEMI JOIN
Section titled “LEFT SEMI JOIN”Returns rows from the left table where at least one matching row exists on the right. Output schema is the left table only — no right columns are included. Equivalent to WHERE EXISTS(...) but expressible as a join clause for clarity and to let the planner reason about it directly.
-- Customers who placed at least one orderSELECT c.id, c.name, c.tierFROM customers cLEFT SEMI JOIN orders o ON c.id = o.customer_id;| id | name | tier |
|---|---|---|
| 1 | Alice | premium |
| 2 | Bob | standard |
Each left row is emitted at most once regardless of how many matching right rows exist. The planner pushes both left- and right-side filters below the join.
LEFT ANTI JOIN
Section titled “LEFT ANTI JOIN”Returns rows from the left table where no matching row exists on the right. Equivalent to WHERE NOT EXISTS(...).
-- Customers with no orders yetSELECT c.id, c.name, c.tierFROM customers cLEFT ANTI JOIN orders o ON c.id = o.customer_id;| id | name | tier |
|---|---|---|
| 3 | Carol | premium |
| 4 | Dave | standard |
Right-side predicates are NOT pushed below the join (doing so would change which rows are excluded). Left-side predicates push normally.
ASOF JOIN (time-series)
Section titled “ASOF JOIN (time-series)”Pairs each left row with the most recent prior right row that matches the equi-key, where right.ts <= left.ts. The canonical use case is enriching a trades stream with the last-known price snapshot per symbol.
CREATE TABLE trades (symbol TEXT, ts BIGINT, qty BIGINT);CREATE TABLE prices (symbol TEXT, ts BIGINT, price FLOAT64);
INSERT INTO trades VALUES ('AAA', 100, 10), ('AAA', 150, 20), ('AAA', 200, 40);INSERT INTO prices VALUES ('AAA', 90, 1.0), ('AAA', 140, 1.5), ('AAA', 180, 2.0);
SELECT t.*, p.priceFROM trades tASOF JOIN prices p MATCH_CONDITION(p.ts <= t.ts) ON t.symbol = p.symbol;| symbol | ts | qty | price |
|---|---|---|---|
| AAA | 100 | 10 | 1.0 |
| AAA | 150 | 20 | 1.5 |
| AAA | 200 | 40 | 2.0 |
Algorithm: ParticleDB hash-buckets the right side by the equi-key, sorts each bucket by timestamp, then binary-searches per left row. INNER ASOF semantics: rows with no prior match are dropped.
The ON clause must include both an equi-key (t.symbol = p.symbol) and a time predicate (p.ts <= t.ts or t.ts >= p.ts).
Performance Notes
Section titled “Performance Notes”- ParticleDB automatically picks the smaller table as the build side of the hash join
- For inputs larger than 100K rows, the probe phase runs in parallel across CPU cores
- The optimizer reorders joins for multi-way joins, placing the smallest tables first
- Int64 key specialization: joins on integer keys use a fast path that avoids hashing overhead
- SEMI / ANTI use a dedicated operator that builds a hash set (existence-only) on the right side; output is bounded by left input size regardless of right-side multiplicity
- ASOF uses sorted hash buckets per equi-key and binary-search probes; per-row cost is O(log N) on right-side bucket size
- For vector similarity joins, see Vector Search