Aggregations

Aggregate functions compute a single result from a set of input rows. Use them with GROUP BY to compute per-group results, or without GROUP BY for a global result across all rows.

Basic Aggregates

COUNT

-- Count all rows
SELECT COUNT(*) FROM orders;

-- Count non-null values in a column
SELECT COUNT(email) FROM users;

-- Count distinct values
SELECT COUNT(DISTINCT category) FROM products;

COUNT(*) is highly optimized — ParticleDB can resolve it from metadata in O(1) when no filter is applied.

SUM

Sum of non-null values. Returns NULL if all inputs are NULL.

SELECT SUM(total) AS total_revenue FROM orders;

SELECT category, SUM(price * quantity) AS category_revenue
FROM order_items
GROUP BY category;

AVG

Arithmetic mean of non-null values.

SELECT AVG(price) AS avg_price FROM products;

SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;

MIN / MAX

Minimum and maximum values. Works on numeric, string, and timestamp types.

SELECT
    MIN(price) AS cheapest,
    MAX(price) AS most_expensive,
    MIN(created_at) AS first_order,
    MAX(created_at) AS latest_order
FROM orders;

Statistical Aggregates

STDDEV / STDDEV_POP

Population standard deviation. STDDEV is an alias for STDDEV_POP.

SELECT STDDEV(response_time) AS stddev_ms
FROM api_requests
WHERE endpoint = '/api/users';

STDDEV_SAMP

Sample standard deviation (uses Bessel’s correction: divides by N-1).

SELECT
    endpoint,
    AVG(response_time) AS mean_ms,
    STDDEV_SAMP(response_time) AS stddev_ms
FROM api_requests
GROUP BY endpoint;

VAR_POP / VARIANCE

Population variance. VARIANCE is an alias for VAR_POP.

SELECT VAR_POP(score) AS score_variance
FROM exam_results;

VAR_SAMP

Sample variance (Bessel’s correction).

SELECT
    subject,
    AVG(score) AS mean,
    VAR_SAMP(score) AS variance
FROM exam_results
GROUP BY subject;

MEDIAN

The middle value when sorted. For an even number of values, returns the average of the two middle values.

SELECT MEDIAN(response_time) AS median_ms
FROM api_requests;

SELECT
    endpoint,
    MEDIAN(response_time) AS p50,
    AVG(response_time) AS mean
FROM api_requests
GROUP BY endpoint;

MODE

The most frequently occurring value. If there are ties, returns one of the modes.

SELECT MODE(category) AS most_common_category
FROM products;

SELECT
    region,
    MODE(payment_method) AS preferred_payment
FROM orders
GROUP BY region;

Percentile Aggregates

PERCENTILE_CONT

Continuous percentile. Interpolates between adjacent values if the percentile falls between two data points. The argument is a fraction between 0 and 1.

-- 95th percentile response time
SELECT PERCENTILE_CONT(0.95) AS p95
FROM api_requests;

SELECT
    endpoint,
    PERCENTILE_CONT(0.5) AS p50,
    PERCENTILE_CONT(0.95) AS p95,
    PERCENTILE_CONT(0.99) AS p99
FROM api_requests
GROUP BY endpoint;

PERCENTILE_DISC

Discrete percentile. Returns an actual value from the dataset (the first value whose cumulative distribution exceeds the specified fraction).

SELECT PERCENTILE_DISC(0.5) AS median_score
FROM exam_results;

Approximate Aggregates

APPROX_COUNT_DISTINCT

Approximate distinct count using HyperLogLog. Returns an estimate within ~2% error with much less memory than exact COUNT(DISTINCT ...). Useful for high-cardinality columns.

-- Approximate unique visitor count
SELECT APPROX_COUNT_DISTINCT(user_id) AS approx_unique_users
FROM page_views;

-- Compare exact vs. approximate
SELECT
    COUNT(DISTINCT user_id) AS exact,
    APPROX_COUNT_DISTINCT(user_id) AS approx
FROM page_views;

String Aggregates

STRING_AGG

Concatenate values from multiple rows into a single string with a separator.

SELECT STRING_AGG(tag, ', ') AS all_tags
FROM article_tags
WHERE article_id = 42;
-- Result: 'rust, database, performance'

SELECT
    department,
    STRING_AGG(name, '; ') AS team_members
FROM employees
GROUP BY department;

GROUP_CONCAT

MySQL-compatible alias for string concatenation. Uses comma as the default separator.

SELECT
    order_id,
    GROUP_CONCAT(product) AS products
FROM order_items
GROUP BY order_id;

Collection Aggregates

ARRAY_AGG

Collect values into an array.

Aliases: JSON_ARRAYAGG, JSONB_AGG, JSON_AGG — these all behave identically to ARRAY_AGG and return the collected values formatted as a JSON array.

SELECT
    customer_id,
    ARRAY_AGG(product) AS purchased_products
FROM order_items
GROUP BY customer_id;

-- JSON_ARRAYAGG / JSONB_AGG / JSON_AGG are aliases for ARRAY_AGG
SELECT
    department,
    JSON_AGG(name) AS team_members
FROM employees
GROUP BY department;

Boolean Aggregates

BOOL_AND

Returns TRUE if all input values are TRUE. Returns NULL if all inputs are NULL.

-- Check if all items in an order have shipped
SELECT
    order_id,
    BOOL_AND(is_shipped) AS fully_shipped
FROM order_items
GROUP BY order_id;

BOOL_OR

Returns TRUE if any input value is TRUE.

-- Check if any item in an order has an issue
SELECT
    order_id,
    BOOL_OR(has_defect) AS has_any_defect
FROM order_items
GROUP BY order_id;

Ordered-Set Aggregates

FIRST

Returns the value associated with the earliest (minimum) timestamp. Takes two arguments: the value column and the ordering column.

-- First recorded status for each order
SELECT
    order_id,
    FIRST(status, updated_at) AS initial_status
FROM order_status_history
GROUP BY order_id;

LAST

Returns the value associated with the latest (maximum) timestamp.

-- Most recent status for each order
SELECT
    order_id,
    LAST(status, updated_at) AS current_status
FROM order_status_history
GROUP BY order_id;

ARG_MIN / MIN_BY

Returns the value of the first argument where the second argument is at its minimum.

Aliases: ARGMIN, MIN_BY

-- Cheapest product name
SELECT ARG_MIN(name, price) AS cheapest_product FROM products;

-- Per-category cheapest product
SELECT
    category,
    MIN_BY(name, price) AS cheapest_product,
    MIN(price) AS lowest_price
FROM products
GROUP BY category;

ARG_MAX / MAX_BY

Returns the value of the first argument where the second argument is at its maximum.

Aliases: ARGMAX, MAX_BY

-- Highest-scoring student's name
SELECT ARG_MAX(name, score) AS top_student FROM exam_results;

-- Per-department: who earns the most?
SELECT
    department,
    MAX_BY(name, salary) AS highest_paid,
    MAX(salary) AS top_salary
FROM employees
GROUP BY department;

ANY_VALUE

Returns an arbitrary non-null value from the group. Useful when you know all values in a group are the same, or when you just need any representative value.

SELECT
    customer_id,
    ANY_VALUE(customer_name) AS name,  -- same for all rows in group
    SUM(total) AS total_spent
FROM orders
GROUP BY customer_id;

FILTER Clause

Any aggregate function can include a FILTER (WHERE ...) clause to restrict which rows are included in the aggregation, without affecting other aggregates in the same query.

SELECT
    COUNT(*) AS total_orders,
    COUNT(*) FILTER (WHERE status = 'shipped') AS shipped,
    COUNT(*) FILTER (WHERE status = 'pending') AS pending,
    COUNT(*) FILTER (WHERE status = 'cancelled') AS cancelled,
    SUM(total) FILTER (WHERE status != 'cancelled') AS active_revenue
FROM orders;

This is equivalent to using CASE expressions but more readable:

-- Equivalent without FILTER (less readable)
SELECT
    COUNT(*) AS total_orders,
    SUM(CASE WHEN status = 'shipped' THEN 1 ELSE 0 END) AS shipped
FROM orders;

DISTINCT Modifier

Apply DISTINCT inside an aggregate to operate only on unique values:

SELECT
    COUNT(category) AS total_entries,           -- counts all non-null rows
    COUNT(DISTINCT category) AS unique_categories  -- counts unique values
FROM products;

SELECT STRING_AGG(DISTINCT category, ', ') AS categories
FROM products;

GROUP BY

Basic GROUP BY

SELECT category, COUNT(*) AS num_products, AVG(price) AS avg_price
FROM products
GROUP BY category;

Multiple Grouping Columns

SELECT region, category, SUM(revenue) AS total
FROM sales
GROUP BY region, category
ORDER BY total DESC;

HAVING

Filter groups (as opposed to WHERE, which filters individual rows):

-- Categories with more than 10 products
SELECT category, COUNT(*) AS cnt
FROM products
GROUP BY category
HAVING COUNT(*) > 10
ORDER BY cnt DESC;

-- Customers who spent more than $1000
SELECT customer_id, SUM(total) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(total) > 1000;

GROUPING SETS

Compute aggregates for multiple grouping levels in a single query:

SELECT region, category, SUM(revenue)
FROM sales
GROUP BY GROUPING SETS (
    (region, category),  -- per region + category
    (region),            -- per region subtotal
    ()                   -- grand total
);

ROLLUP

Hierarchical subtotals — shorthand for common GROUPING SETS patterns:

-- Equivalent to GROUPING SETS ((region, category), (region), ())
SELECT region, category, SUM(revenue)
FROM sales
GROUP BY ROLLUP(region, category);

CUBE

All possible combinations of grouping columns:

-- Equivalent to GROUPING SETS ((region, category), (region), (category), ())
SELECT region, category, SUM(revenue)
FROM sales
GROUP BY CUBE(region, category);

Performance Notes

ParticleDB uses multiple specialized aggregation engines, automatically selected based on query characteristics:

Ungrouped aggregates (no GROUP BY): Resolved from precomputed zone-level statistics when possible (O(chunks) instead of O(rows)). COUNT(*) resolves from metadata in O(1).
Low-cardinality groups (< ~8 groups): SIMD bitmask-based accumulation — scans each group in parallel using vectorized masks.
Int64 GROUP BY with moderate cardinality: Dense-array direct accumulation — O(1) per row with no hashing.
High-cardinality sorted data: Range-limited per-thread dense arrays that exploit data locality.
High-cardinality unsorted: Radix-partitioned hash aggregation with parallel merge.
Filter+aggregate fusion: Filter predicates and aggregation are evaluated in a single pass without materializing intermediate results.

All paths run in parallel across CPU cores.