Skip to content

Joins

Joins combine rows from two or more tables based on a related column. ParticleDB uses a streaming hash join engine with parallel probe for large inputs.

Join TypeReturns
INNER JOINOnly rows that match in both tables
LEFT JOINAll rows from the left table, plus matches from the right (NULLs for non-matches)
RIGHT JOINAll rows from the right table, plus matches from the left (NULLs for non-matches)
FULL OUTER JOINAll rows from both tables (NULLs where no match exists on either side)
CROSS JOINEvery combination of rows from both tables (Cartesian product)
NATURAL JOINAutomatic INNER JOIN on all columns with the same name

The examples below use these three tables:

CREATE TABLE customers (
id BIGINT PRIMARY KEY,
name TEXT NOT NULL,
tier TEXT DEFAULT 'standard'
);
CREATE TABLE orders (
id BIGINT PRIMARY KEY,
customer_id BIGINT NOT NULL,
total FLOAT64 NOT NULL,
status TEXT DEFAULT 'pending'
);
CREATE TABLE order_items (
id BIGINT PRIMARY KEY,
order_id BIGINT NOT NULL,
product TEXT NOT NULL,
quantity INT64 NOT NULL,
unit_price FLOAT64 NOT NULL
);
INSERT INTO customers VALUES
(1, 'Alice', 'premium'),
(2, 'Bob', 'standard'),
(3, 'Carol', 'premium'),
(4, 'Dave', 'standard');
INSERT INTO orders VALUES
(100, 1, 250.00, 'shipped'),
(101, 1, 75.00, 'delivered'),
(102, 2, 150.00, 'pending'),
(103, 5, 300.00, 'shipped'); -- customer_id 5 does not exist
INSERT INTO order_items VALUES
(1, 100, 'Keyboard', 1, 89.99),
(2, 100, 'Mouse', 2, 29.99),
(3, 101, 'USB Cable', 3, 9.99),
(4, 102, 'Monitor', 1, 149.99);

Returns only rows where the join condition matches in both tables. Rows without a match on either side are excluded.

SELECT c.name, o.id AS order_id, o.total
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id;
nameorder_idtotal
Alice100250.00
Alice10175.00
Bob102150.00

Carol (no orders) and order 103 (customer_id 5 doesn’t exist) are both excluded.

JOIN without a qualifier defaults to INNER JOIN:

SELECT c.name, o.total
FROM customers c
JOIN orders o ON c.id = o.customer_id;

Returns all rows from the left table. Where no match exists in the right table, right-side columns are filled with NULL.

SELECT c.name, o.id AS order_id, o.total
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id;
nameorder_idtotal
Alice100250.00
Alice10175.00
Bob102150.00
CarolNULLNULL
DaveNULLNULL

Use LEFT JOIN to find rows without matches:

-- Customers who have never placed an order
SELECT c.name
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.id IS NULL;
name
Carol
Dave

The mirror of LEFT JOIN: all rows from the right table are kept, and left-side columns are NULL for non-matches.

SELECT c.name, o.id AS order_id, o.total
FROM customers c
RIGHT JOIN orders o ON c.id = o.customer_id;
nameorder_idtotal
Alice100250.00
Alice10175.00
Bob102150.00
NULL103300.00

Order 103 (customer_id 5) has no matching customer, so name is NULL.


Returns all rows from both tables, with NULLs on the side where there is no match.

SELECT c.name, o.id AS order_id, o.total
FROM customers c
FULL OUTER JOIN orders o ON c.id = o.customer_id;
nameorder_idtotal
Alice100250.00
Alice10175.00
Bob102150.00
CarolNULLNULL
DaveNULLNULL
NULL103300.00

Produces the Cartesian product — every row from the left table paired with every row from the right. No join condition is specified.

SELECT c.name, t.label
FROM customers c
CROSS JOIN (VALUES ('Q1'), ('Q2'), ('Q3'), ('Q4')) AS t(label);

This returns 4 customers x 4 quarters = 16 rows.

An equivalent syntax uses comma-separated FROM:

SELECT c.name, t.label
FROM customers c, (VALUES ('Q1'), ('Q2'), ('Q3'), ('Q4')) AS t(label);

Automatically joins on all columns with matching names in both tables. Eliminates duplicate columns from the result.

-- If both tables have an "id" column, NATURAL JOIN matches on id
-- This is rarely used in practice -- explicit ON/USING is preferred
SELECT * FROM employees NATURAL JOIN departments;

Caution: NATURAL JOIN can produce unexpected results if tables share column names that shouldn’t be join keys. Prefer explicit ON or USING clauses.


When the join column has the same name in both tables, USING is a cleaner alternative to ON:

-- These two are equivalent:
SELECT * FROM orders JOIN order_items ON orders.id = order_items.order_id;
-- Using USING when the column name matches
-- (this requires the column to have the same name in both tables)
CREATE TABLE order_items_v2 (
id BIGINT PRIMARY KEY,
id BIGINT NOT NULL, -- same name as orders.id
product TEXT
);
SELECT * FROM orders JOIN order_items_v2 USING (id);

In practice, USING is most useful when foreign key columns share names:

SELECT c.name, o.total
FROM customers c
JOIN orders o USING (customer_id);
-- Equivalent to: ON c.customer_id = o.customer_id

Chain joins to combine three or more tables:

-- Full order details: customer + order + items
SELECT
c.name AS customer,
o.id AS order_id,
o.status,
oi.product,
oi.quantity,
oi.unit_price,
oi.quantity * oi.unit_price AS line_total
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
ORDER BY c.name, o.id, oi.product;
customerorder_idstatusproductquantityunit_priceline_total
Alice100shippedKeyboard189.9989.99
Alice100shippedMouse229.9959.98
Alice101deliveredUSB Cable39.9929.97
Bob102pendingMonitor1149.99149.99
-- All customers, their orders (if any), and order items (if any)
SELECT
c.name,
o.id AS order_id,
oi.product
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
LEFT JOIN order_items oi ON o.id = oi.order_id
ORDER BY c.name;

Join a table to itself to compare rows within the same table:

-- Find employees who earn more than their manager
SELECT
e.name AS employee,
e.salary AS employee_salary,
m.name AS manager,
m.salary AS manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;

Combine joins with GROUP BY to compute per-group metrics across tables:

-- Revenue per customer
SELECT
c.name,
c.tier,
COUNT(o.id) AS order_count,
COALESCE(SUM(o.total), 0) AS total_revenue
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.name, c.tier
ORDER BY total_revenue DESC;
nametierorder_counttotal_revenue
Alicepremium2325.00
Bobstandard1150.00
Carolpremium00
Davestandard00

  • ParticleDB automatically picks the smaller table as the build side of the hash join
  • For inputs larger than 100K rows, the probe phase runs in parallel across CPU cores
  • The optimizer reorders joins for multi-way joins, placing the smallest tables first
  • Int64 key specialization: joins on integer keys use a fast path that avoids hashing overhead
  • For vector similarity joins, see Vector Search