Joins

Joins combine rows from two or more tables based on a related column. ParticleDB uses a streaming hash join engine with parallel probe for large inputs and dedicated operators for SEMI/ANTI/ASOF joins.

JOIN Types at a Glance

Join Type	Returns
`INNER JOIN`	Only rows that match in both tables
`LEFT JOIN`	All rows from the left table, plus matches from the right (NULLs for non-matches)
`RIGHT JOIN`	All rows from the right table, plus matches from the left (NULLs for non-matches)
`FULL OUTER JOIN`	All rows from both tables (NULLs where no match exists on either side)
`CROSS JOIN`	Every combination of rows from both tables (Cartesian product)
`NATURAL JOIN`	Automatic INNER JOIN on all columns with the same name
`LEFT SEMI JOIN`	Rows from the left table where at least one match exists on the right (left columns only). Equivalent to `WHERE EXISTS(...)`
`LEFT ANTI JOIN`	Rows from the left table where no match exists on the right (left columns only). Equivalent to `WHERE NOT EXISTS(...)`
`ASOF JOIN`	Time-series join: pairs each left row with the most recent prior right row (per equi-key)

Setup

The examples below use these three tables:

CREATE TABLE customers (
    id   BIGINT PRIMARY KEY,
    name TEXT NOT NULL,
    tier TEXT DEFAULT 'standard'
);

CREATE TABLE orders (
    id          BIGINT PRIMARY KEY,
    customer_id BIGINT NOT NULL,
    total       FLOAT64 NOT NULL,
    status      TEXT DEFAULT 'pending'
);

CREATE TABLE order_items (
    id         BIGINT PRIMARY KEY,
    order_id   BIGINT NOT NULL,
    product    TEXT NOT NULL,
    quantity   INT64 NOT NULL,
    unit_price FLOAT64 NOT NULL
);

INSERT INTO customers VALUES
    (1, 'Alice', 'premium'),
    (2, 'Bob', 'standard'),
    (3, 'Carol', 'premium'),
    (4, 'Dave', 'standard');

INSERT INTO orders VALUES
    (100, 1, 250.00, 'shipped'),
    (101, 1, 75.00, 'delivered'),
    (102, 2, 150.00, 'pending'),
    (103, 5, 300.00, 'shipped');   -- customer_id 5 does not exist

INSERT INTO order_items VALUES
    (1, 100, 'Keyboard', 1, 89.99),
    (2, 100, 'Mouse', 2, 29.99),
    (3, 101, 'USB Cable', 3, 9.99),
    (4, 102, 'Monitor', 1, 149.99);

INNER JOIN

Returns only rows where the join condition matches in both tables. Rows without a match on either side are excluded.

SELECT c.name, o.id AS order_id, o.total
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id;

name	order_id	total
Alice	100	250.00
Alice	101	75.00
Bob	102	150.00

Carol (no orders) and order 103 (customer_id 5 doesn’t exist) are both excluded.

JOIN without a qualifier defaults to INNER JOIN:

SELECT c.name, o.total
FROM customers c
JOIN orders o ON c.id = o.customer_id;

LEFT JOIN

Returns all rows from the left table. Where no match exists in the right table, right-side columns are filled with NULL.

SELECT c.name, o.id AS order_id, o.total
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id;

name	order_id	total
Alice	100	250.00
Alice	101	75.00
Bob	102	150.00
Carol	NULL	NULL
Dave	NULL	NULL

Use LEFT JOIN to find rows without matches:

-- Customers who have never placed an order
SELECT c.name
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.id IS NULL;

name
Carol
Dave

RIGHT JOIN

The mirror of LEFT JOIN: all rows from the right table are kept, and left-side columns are NULL for non-matches.

SELECT c.name, o.id AS order_id, o.total
FROM customers c
RIGHT JOIN orders o ON c.id = o.customer_id;

name	order_id	total
Alice	100	250.00
Alice	101	75.00
Bob	102	150.00
NULL	103	300.00

Order 103 (customer_id 5) has no matching customer, so name is NULL.

FULL OUTER JOIN

Returns all rows from both tables, with NULLs on the side where there is no match.

SELECT c.name, o.id AS order_id, o.total
FROM customers c
FULL OUTER JOIN orders o ON c.id = o.customer_id;

name	order_id	total
Alice	100	250.00
Alice	101	75.00
Bob	102	150.00
Carol	NULL	NULL
Dave	NULL	NULL
NULL	103	300.00

CROSS JOIN

Produces the Cartesian product — every row from the left table paired with every row from the right. No join condition is specified.

SELECT c.name, t.label
FROM customers c
CROSS JOIN (VALUES ('Q1'), ('Q2'), ('Q3'), ('Q4')) AS t(label);

This returns 4 customers x 4 quarters = 16 rows.

An equivalent syntax uses comma-separated FROM:

SELECT c.name, t.label
FROM customers c, (VALUES ('Q1'), ('Q2'), ('Q3'), ('Q4')) AS t(label);

NATURAL JOIN

Automatically joins on all columns with matching names in both tables. Eliminates duplicate columns from the result.

-- If both tables have an "id" column, NATURAL JOIN matches on id
-- This is rarely used in practice -- explicit ON/USING is preferred
SELECT * FROM employees NATURAL JOIN departments;

Caution: NATURAL JOIN can produce unexpected results if tables share column names that shouldn’t be join keys. Prefer explicit ON or USING clauses.

USING Clause

When the join column has the same name in both tables, USING is a cleaner alternative to ON:

-- These two are equivalent:
SELECT * FROM orders JOIN order_items ON orders.id = order_items.order_id;

-- Using USING when the column name matches
-- (this requires the column to have the same name in both tables)
CREATE TABLE order_items_v2 (
    id       BIGINT PRIMARY KEY,
    id       BIGINT NOT NULL,  -- same name as orders.id
    product  TEXT
);

SELECT * FROM orders JOIN order_items_v2 USING (id);

In practice, USING is most useful when foreign key columns share names:

SELECT c.name, o.total
FROM customers c
JOIN orders o USING (customer_id);
-- Equivalent to: ON c.customer_id = o.customer_id

Multi-Table Joins

Chain joins to combine three or more tables:

-- Full order details: customer + order + items
SELECT
    c.name AS customer,
    o.id AS order_id,
    o.status,
    oi.product,
    oi.quantity,
    oi.unit_price,
    oi.quantity * oi.unit_price AS line_total
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
ORDER BY c.name, o.id, oi.product;

customer	order_id	status	product	quantity	unit_price	line_total
Alice	100	shipped	Keyboard	1	89.99	89.99
Alice	100	shipped	Mouse	2	29.99	59.98
Alice	101	delivered	USB Cable	3	9.99	29.97
Bob	102	pending	Monitor	1	149.99	149.99

Mixing Join Types

-- All customers, their orders (if any), and order items (if any)
SELECT
    c.name,
    o.id AS order_id,
    oi.product
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
LEFT JOIN order_items oi ON o.id = oi.order_id
ORDER BY c.name;

Self Join

Join a table to itself to compare rows within the same table:

-- Find employees who earn more than their manager
SELECT
    e.name AS employee,
    e.salary AS employee_salary,
    m.name AS manager,
    m.salary AS manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;

Join with Aggregation

Combine joins with GROUP BY to compute per-group metrics across tables:

-- Revenue per customer
SELECT
    c.name,
    c.tier,
    COUNT(o.id) AS order_count,
    COALESCE(SUM(o.total), 0) AS total_revenue
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.name, c.tier
ORDER BY total_revenue DESC;

name	tier	order_count	total_revenue
Alice	premium	2	325.00
Bob	standard	1	150.00
Carol	premium	0	0
Dave	standard	0	0

LEFT SEMI JOIN

Returns rows from the left table where at least one matching row exists on the right. Output schema is the left table only — no right columns are included. Equivalent to WHERE EXISTS(...) but expressible as a join clause for clarity and to let the planner reason about it directly.

-- Customers who placed at least one order
SELECT c.id, c.name, c.tier
FROM customers c
LEFT SEMI JOIN orders o ON c.id = o.customer_id;

id	name	tier
1	Alice	premium
2	Bob	standard

Each left row is emitted at most once regardless of how many matching right rows exist. The planner pushes both left- and right-side filters below the join.

LEFT ANTI JOIN

Returns rows from the left table where no matching row exists on the right. Equivalent to WHERE NOT EXISTS(...).

-- Customers with no orders yet
SELECT c.id, c.name, c.tier
FROM customers c
LEFT ANTI JOIN orders o ON c.id = o.customer_id;

id	name	tier
3	Carol	premium
4	Dave	standard

Right-side predicates are NOT pushed below the join (doing so would change which rows are excluded). Left-side predicates push normally.

ASOF JOIN (time-series)

Pairs each left row with the most recent prior right row that matches the equi-key, where right.ts <= left.ts. The canonical use case is enriching a trades stream with the last-known price snapshot per symbol.

CREATE TABLE trades (symbol TEXT, ts BIGINT, qty BIGINT);
CREATE TABLE prices (symbol TEXT, ts BIGINT, price FLOAT64);

INSERT INTO trades VALUES ('AAA', 100, 10), ('AAA', 150, 20), ('AAA', 200, 40);
INSERT INTO prices VALUES ('AAA', 90, 1.0), ('AAA', 140, 1.5), ('AAA', 180, 2.0);

SELECT t.*, p.price
FROM trades t
ASOF JOIN prices p
  MATCH_CONDITION(p.ts <= t.ts)
  ON t.symbol = p.symbol;

symbol	ts	qty	price
AAA	100	10	1.0
AAA	150	20	1.5
AAA	200	40	2.0

Algorithm: ParticleDB hash-buckets the right side by the equi-key, sorts each bucket by timestamp, then binary-searches per left row. INNER ASOF semantics: rows with no prior match are dropped.

The ON clause must include both an equi-key (t.symbol = p.symbol) and a time predicate (p.ts <= t.ts or t.ts >= p.ts).

Performance Notes

ParticleDB automatically picks the smaller table as the build side of the hash join
For inputs larger than 100K rows, the probe phase runs in parallel across CPU cores
The optimizer reorders joins for multi-way joins, placing the smallest tables first
Int64 key specialization: joins on integer keys use a fast path that avoids hashing overhead
SEMI / ANTI use a dedicated operator that builds a hash set (existence-only) on the right side; output is bounded by left input size regardless of right-side multiplicity
ASOF uses sorted hash buckets per equi-key and binary-search probes; per-row cost is O(log N) on right-side bucket size
For vector similarity joins, see Vector Search