Skip to content

Aggregations

Aggregate functions compute a single result from a set of input rows. Use them with GROUP BY to compute per-group results, or without GROUP BY for a global result across all rows.

-- Count all rows
SELECT COUNT(*) FROM orders;
-- Count non-null values in a column
SELECT COUNT(email) FROM users;
-- Count distinct values
SELECT COUNT(DISTINCT category) FROM products;

COUNT(*) is highly optimized — ParticleDB can resolve it from metadata in O(1) when no filter is applied.

Sum of non-null values. Returns NULL if all inputs are NULL.

SELECT SUM(total) AS total_revenue FROM orders;
SELECT category, SUM(price * quantity) AS category_revenue
FROM order_items
GROUP BY category;

Arithmetic mean of non-null values.

SELECT AVG(price) AS avg_price FROM products;
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;

Minimum and maximum values. Works on numeric, string, and timestamp types.

SELECT
MIN(price) AS cheapest,
MAX(price) AS most_expensive,
MIN(created_at) AS first_order,
MAX(created_at) AS latest_order
FROM orders;

Population standard deviation. STDDEV is an alias for STDDEV_POP.

SELECT STDDEV(response_time) AS stddev_ms
FROM api_requests
WHERE endpoint = '/api/users';

Sample standard deviation (uses Bessel’s correction: divides by N-1).

SELECT
endpoint,
AVG(response_time) AS mean_ms,
STDDEV_SAMP(response_time) AS stddev_ms
FROM api_requests
GROUP BY endpoint;

Population variance. VARIANCE is an alias for VAR_POP.

SELECT VAR_POP(score) AS score_variance
FROM exam_results;

Sample variance (Bessel’s correction).

SELECT
subject,
AVG(score) AS mean,
VAR_SAMP(score) AS variance
FROM exam_results
GROUP BY subject;

The middle value when sorted. For an even number of values, returns the average of the two middle values.

SELECT MEDIAN(response_time) AS median_ms
FROM api_requests;
SELECT
endpoint,
MEDIAN(response_time) AS p50,
AVG(response_time) AS mean
FROM api_requests
GROUP BY endpoint;

The most frequently occurring value. If there are ties, returns one of the modes.

SELECT MODE(category) AS most_common_category
FROM products;
SELECT
region,
MODE(payment_method) AS preferred_payment
FROM orders
GROUP BY region;

Continuous percentile. Interpolates between adjacent values if the percentile falls between two data points. The argument is a fraction between 0 and 1.

-- 95th percentile response time
SELECT PERCENTILE_CONT(0.95) AS p95
FROM api_requests;
SELECT
endpoint,
PERCENTILE_CONT(0.5) AS p50,
PERCENTILE_CONT(0.95) AS p95,
PERCENTILE_CONT(0.99) AS p99
FROM api_requests
GROUP BY endpoint;

Discrete percentile. Returns an actual value from the dataset (the first value whose cumulative distribution exceeds the specified fraction).

SELECT PERCENTILE_DISC(0.5) AS median_score
FROM exam_results;

Approximate distinct count using HyperLogLog. Returns an estimate within ~2% error with much less memory than exact COUNT(DISTINCT ...). Useful for high-cardinality columns.

-- Approximate unique visitor count
SELECT APPROX_COUNT_DISTINCT(user_id) AS approx_unique_users
FROM page_views;
-- Compare exact vs. approximate
SELECT
COUNT(DISTINCT user_id) AS exact,
APPROX_COUNT_DISTINCT(user_id) AS approx
FROM page_views;

Concatenate values from multiple rows into a single string with a separator.

SELECT STRING_AGG(tag, ', ') AS all_tags
FROM article_tags
WHERE article_id = 42;
-- Result: 'rust, database, performance'
SELECT
department,
STRING_AGG(name, '; ') AS team_members
FROM employees
GROUP BY department;

MySQL-compatible alias for string concatenation. Uses comma as the default separator.

SELECT
order_id,
GROUP_CONCAT(product) AS products
FROM order_items
GROUP BY order_id;

Collect values into an array.

Aliases: JSON_ARRAYAGG, JSONB_AGG, JSON_AGG — these all behave identically to ARRAY_AGG and return the collected values formatted as a JSON array.

SELECT
customer_id,
ARRAY_AGG(product) AS purchased_products
FROM order_items
GROUP BY customer_id;
-- JSON_ARRAYAGG / JSONB_AGG / JSON_AGG are aliases for ARRAY_AGG
SELECT
department,
JSON_AGG(name) AS team_members
FROM employees
GROUP BY department;

Returns TRUE if all input values are TRUE. Returns NULL if all inputs are NULL.

-- Check if all items in an order have shipped
SELECT
order_id,
BOOL_AND(is_shipped) AS fully_shipped
FROM order_items
GROUP BY order_id;

Returns TRUE if any input value is TRUE.

-- Check if any item in an order has an issue
SELECT
order_id,
BOOL_OR(has_defect) AS has_any_defect
FROM order_items
GROUP BY order_id;

Returns the value associated with the earliest (minimum) timestamp. Takes two arguments: the value column and the ordering column.

-- First recorded status for each order
SELECT
order_id,
FIRST(status, updated_at) AS initial_status
FROM order_status_history
GROUP BY order_id;

Returns the value associated with the latest (maximum) timestamp.

-- Most recent status for each order
SELECT
order_id,
LAST(status, updated_at) AS current_status
FROM order_status_history
GROUP BY order_id;

Returns the value of the first argument where the second argument is at its minimum.

Aliases: ARGMIN, MIN_BY

-- Cheapest product name
SELECT ARG_MIN(name, price) AS cheapest_product FROM products;
-- Per-category cheapest product
SELECT
category,
MIN_BY(name, price) AS cheapest_product,
MIN(price) AS lowest_price
FROM products
GROUP BY category;

Returns the value of the first argument where the second argument is at its maximum.

Aliases: ARGMAX, MAX_BY

-- Highest-scoring student's name
SELECT ARG_MAX(name, score) AS top_student FROM exam_results;
-- Per-department: who earns the most?
SELECT
department,
MAX_BY(name, salary) AS highest_paid,
MAX(salary) AS top_salary
FROM employees
GROUP BY department;

Returns an arbitrary non-null value from the group. Useful when you know all values in a group are the same, or when you just need any representative value.

SELECT
customer_id,
ANY_VALUE(customer_name) AS name, -- same for all rows in group
SUM(total) AS total_spent
FROM orders
GROUP BY customer_id;

Any aggregate function can include a FILTER (WHERE ...) clause to restrict which rows are included in the aggregation, without affecting other aggregates in the same query.

SELECT
COUNT(*) AS total_orders,
COUNT(*) FILTER (WHERE status = 'shipped') AS shipped,
COUNT(*) FILTER (WHERE status = 'pending') AS pending,
COUNT(*) FILTER (WHERE status = 'cancelled') AS cancelled,
SUM(total) FILTER (WHERE status != 'cancelled') AS active_revenue
FROM orders;

This is equivalent to using CASE expressions but more readable:

-- Equivalent without FILTER (less readable)
SELECT
COUNT(*) AS total_orders,
SUM(CASE WHEN status = 'shipped' THEN 1 ELSE 0 END) AS shipped
FROM orders;

Apply DISTINCT inside an aggregate to operate only on unique values:

SELECT
COUNT(category) AS total_entries, -- counts all non-null rows
COUNT(DISTINCT category) AS unique_categories -- counts unique values
FROM products;
SELECT STRING_AGG(DISTINCT category, ', ') AS categories
FROM products;

SELECT category, COUNT(*) AS num_products, AVG(price) AS avg_price
FROM products
GROUP BY category;
SELECT region, category, SUM(revenue) AS total
FROM sales
GROUP BY region, category
ORDER BY total DESC;

Filter groups (as opposed to WHERE, which filters individual rows):

-- Categories with more than 10 products
SELECT category, COUNT(*) AS cnt
FROM products
GROUP BY category
HAVING COUNT(*) > 10
ORDER BY cnt DESC;
-- Customers who spent more than $1000
SELECT customer_id, SUM(total) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(total) > 1000;

Compute aggregates for multiple grouping levels in a single query:

SELECT region, category, SUM(revenue)
FROM sales
GROUP BY GROUPING SETS (
(region, category), -- per region + category
(region), -- per region subtotal
() -- grand total
);

Hierarchical subtotals — shorthand for common GROUPING SETS patterns:

-- Equivalent to GROUPING SETS ((region, category), (region), ())
SELECT region, category, SUM(revenue)
FROM sales
GROUP BY ROLLUP(region, category);

All possible combinations of grouping columns:

-- Equivalent to GROUPING SETS ((region, category), (region), (category), ())
SELECT region, category, SUM(revenue)
FROM sales
GROUP BY CUBE(region, category);

ParticleDB uses multiple specialized aggregation engines, automatically selected based on query characteristics:

  • Ungrouped aggregates (no GROUP BY): Resolved from precomputed zone-level statistics when possible (O(chunks) instead of O(rows)). COUNT(*) resolves from metadata in O(1).
  • Low-cardinality groups (< ~8 groups): SIMD bitmask-based accumulation — scans each group in parallel using vectorized masks.
  • Int64 GROUP BY with moderate cardinality: Dense-array direct accumulation — O(1) per row with no hashing.
  • High-cardinality sorted data: Range-limited per-thread dense arrays that exploit data locality.
  • High-cardinality unsorted: Radix-partitioned hash aggregation with parallel merge.
  • Filter+aggregate fusion: Filter predicates and aggregation are evaluated in a single pass without materializing intermediate results.

All paths use Rayon-based parallelism across CPU cores.