Skip to content

Joins

Joins combine rows from two or more tables based on a related column. ParticleDB uses a streaming hash join engine with parallel probe for large inputs and dedicated operators for SEMI/ANTI/ASOF joins.

Join TypeReturns
INNER JOINOnly rows that match in both tables
LEFT JOINAll rows from the left table, plus matches from the right (NULLs for non-matches)
RIGHT JOINAll rows from the right table, plus matches from the left (NULLs for non-matches)
FULL OUTER JOINAll rows from both tables (NULLs where no match exists on either side)
CROSS JOINEvery combination of rows from both tables (Cartesian product)
NATURAL JOINAutomatic INNER JOIN on all columns with the same name
LEFT SEMI JOINRows from the left table where at least one match exists on the right (left columns only). Equivalent to WHERE EXISTS(...)
LEFT ANTI JOINRows from the left table where no match exists on the right (left columns only). Equivalent to WHERE NOT EXISTS(...)
ASOF JOINTime-series join: pairs each left row with the most recent prior right row (per equi-key)

The examples below use these three tables:

CREATE TABLE customers (
id BIGINT PRIMARY KEY,
name TEXT NOT NULL,
tier TEXT DEFAULT 'standard'
);
CREATE TABLE orders (
id BIGINT PRIMARY KEY,
customer_id BIGINT NOT NULL,
total FLOAT64 NOT NULL,
status TEXT DEFAULT 'pending'
);
CREATE TABLE order_items (
id BIGINT PRIMARY KEY,
order_id BIGINT NOT NULL,
product TEXT NOT NULL,
quantity INT64 NOT NULL,
unit_price FLOAT64 NOT NULL
);
INSERT INTO customers VALUES
(1, 'Alice', 'premium'),
(2, 'Bob', 'standard'),
(3, 'Carol', 'premium'),
(4, 'Dave', 'standard');
INSERT INTO orders VALUES
(100, 1, 250.00, 'shipped'),
(101, 1, 75.00, 'delivered'),
(102, 2, 150.00, 'pending'),
(103, 5, 300.00, 'shipped'); -- customer_id 5 does not exist
INSERT INTO order_items VALUES
(1, 100, 'Keyboard', 1, 89.99),
(2, 100, 'Mouse', 2, 29.99),
(3, 101, 'USB Cable', 3, 9.99),
(4, 102, 'Monitor', 1, 149.99);

Returns only rows where the join condition matches in both tables. Rows without a match on either side are excluded.

SELECT c.name, o.id AS order_id, o.total
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id;
nameorder_idtotal
Alice100250.00
Alice10175.00
Bob102150.00

Carol (no orders) and order 103 (customer_id 5 doesn’t exist) are both excluded.

JOIN without a qualifier defaults to INNER JOIN:

SELECT c.name, o.total
FROM customers c
JOIN orders o ON c.id = o.customer_id;

Returns all rows from the left table. Where no match exists in the right table, right-side columns are filled with NULL.

SELECT c.name, o.id AS order_id, o.total
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id;
nameorder_idtotal
Alice100250.00
Alice10175.00
Bob102150.00
CarolNULLNULL
DaveNULLNULL

Use LEFT JOIN to find rows without matches:

-- Customers who have never placed an order
SELECT c.name
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.id IS NULL;
name
Carol
Dave

The mirror of LEFT JOIN: all rows from the right table are kept, and left-side columns are NULL for non-matches.

SELECT c.name, o.id AS order_id, o.total
FROM customers c
RIGHT JOIN orders o ON c.id = o.customer_id;
nameorder_idtotal
Alice100250.00
Alice10175.00
Bob102150.00
NULL103300.00

Order 103 (customer_id 5) has no matching customer, so name is NULL.


Returns all rows from both tables, with NULLs on the side where there is no match.

SELECT c.name, o.id AS order_id, o.total
FROM customers c
FULL OUTER JOIN orders o ON c.id = o.customer_id;
nameorder_idtotal
Alice100250.00
Alice10175.00
Bob102150.00
CarolNULLNULL
DaveNULLNULL
NULL103300.00

Produces the Cartesian product — every row from the left table paired with every row from the right. No join condition is specified.

SELECT c.name, t.label
FROM customers c
CROSS JOIN (VALUES ('Q1'), ('Q2'), ('Q3'), ('Q4')) AS t(label);

This returns 4 customers x 4 quarters = 16 rows.

An equivalent syntax uses comma-separated FROM:

SELECT c.name, t.label
FROM customers c, (VALUES ('Q1'), ('Q2'), ('Q3'), ('Q4')) AS t(label);

Automatically joins on all columns with matching names in both tables. Eliminates duplicate columns from the result.

-- If both tables have an "id" column, NATURAL JOIN matches on id
-- This is rarely used in practice -- explicit ON/USING is preferred
SELECT * FROM employees NATURAL JOIN departments;

Caution: NATURAL JOIN can produce unexpected results if tables share column names that shouldn’t be join keys. Prefer explicit ON or USING clauses.


When the join column has the same name in both tables, USING is a cleaner alternative to ON:

-- These two are equivalent:
SELECT * FROM orders JOIN order_items ON orders.id = order_items.order_id;
-- Using USING when the column name matches
-- (this requires the column to have the same name in both tables)
CREATE TABLE order_items_v2 (
id BIGINT PRIMARY KEY,
id BIGINT NOT NULL, -- same name as orders.id
product TEXT
);
SELECT * FROM orders JOIN order_items_v2 USING (id);

In practice, USING is most useful when foreign key columns share names:

SELECT c.name, o.total
FROM customers c
JOIN orders o USING (customer_id);
-- Equivalent to: ON c.customer_id = o.customer_id

Chain joins to combine three or more tables:

-- Full order details: customer + order + items
SELECT
c.name AS customer,
o.id AS order_id,
o.status,
oi.product,
oi.quantity,
oi.unit_price,
oi.quantity * oi.unit_price AS line_total
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
ORDER BY c.name, o.id, oi.product;
customerorder_idstatusproductquantityunit_priceline_total
Alice100shippedKeyboard189.9989.99
Alice100shippedMouse229.9959.98
Alice101deliveredUSB Cable39.9929.97
Bob102pendingMonitor1149.99149.99
-- All customers, their orders (if any), and order items (if any)
SELECT
c.name,
o.id AS order_id,
oi.product
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
LEFT JOIN order_items oi ON o.id = oi.order_id
ORDER BY c.name;

Join a table to itself to compare rows within the same table:

-- Find employees who earn more than their manager
SELECT
e.name AS employee,
e.salary AS employee_salary,
m.name AS manager,
m.salary AS manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;

Combine joins with GROUP BY to compute per-group metrics across tables:

-- Revenue per customer
SELECT
c.name,
c.tier,
COUNT(o.id) AS order_count,
COALESCE(SUM(o.total), 0) AS total_revenue
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.name, c.tier
ORDER BY total_revenue DESC;
nametierorder_counttotal_revenue
Alicepremium2325.00
Bobstandard1150.00
Carolpremium00
Davestandard00

Returns rows from the left table where at least one matching row exists on the right. Output schema is the left table only — no right columns are included. Equivalent to WHERE EXISTS(...) but expressible as a join clause for clarity and to let the planner reason about it directly.

-- Customers who placed at least one order
SELECT c.id, c.name, c.tier
FROM customers c
LEFT SEMI JOIN orders o ON c.id = o.customer_id;
idnametier
1Alicepremium
2Bobstandard

Each left row is emitted at most once regardless of how many matching right rows exist. The planner pushes both left- and right-side filters below the join.

Returns rows from the left table where no matching row exists on the right. Equivalent to WHERE NOT EXISTS(...).

-- Customers with no orders yet
SELECT c.id, c.name, c.tier
FROM customers c
LEFT ANTI JOIN orders o ON c.id = o.customer_id;
idnametier
3Carolpremium
4Davestandard

Right-side predicates are NOT pushed below the join (doing so would change which rows are excluded). Left-side predicates push normally.

Pairs each left row with the most recent prior right row that matches the equi-key, where right.ts <= left.ts. The canonical use case is enriching a trades stream with the last-known price snapshot per symbol.

CREATE TABLE trades (symbol TEXT, ts BIGINT, qty BIGINT);
CREATE TABLE prices (symbol TEXT, ts BIGINT, price FLOAT64);
INSERT INTO trades VALUES ('AAA', 100, 10), ('AAA', 150, 20), ('AAA', 200, 40);
INSERT INTO prices VALUES ('AAA', 90, 1.0), ('AAA', 140, 1.5), ('AAA', 180, 2.0);
SELECT t.*, p.price
FROM trades t
ASOF JOIN prices p
MATCH_CONDITION(p.ts <= t.ts)
ON t.symbol = p.symbol;
symboltsqtyprice
AAA100101.0
AAA150201.5
AAA200402.0

Algorithm: ParticleDB hash-buckets the right side by the equi-key, sorts each bucket by timestamp, then binary-searches per left row. INNER ASOF semantics: rows with no prior match are dropped.

The ON clause must include both an equi-key (t.symbol = p.symbol) and a time predicate (p.ts <= t.ts or t.ts >= p.ts).


  • ParticleDB automatically picks the smaller table as the build side of the hash join
  • For inputs larger than 100K rows, the probe phase runs in parallel across CPU cores
  • The optimizer reorders joins for multi-way joins, placing the smallest tables first
  • Int64 key specialization: joins on integer keys use a fast path that avoids hashing overhead
  • SEMI / ANTI use a dedicated operator that builds a hash set (existence-only) on the right side; output is bounded by left input size regardless of right-side multiplicity
  • ASOF uses sorted hash buckets per equi-key and binary-search probes; per-row cost is O(log N) on right-side bucket size
  • For vector similarity joins, see Vector Search