Aggregations
Aggregate functions compute a single result from a set of input rows. Use them with GROUP BY to compute per-group results, or without GROUP BY for a global result across all rows.
Basic Aggregates
Section titled “Basic Aggregates”-- Count all rowsSELECT COUNT(*) FROM orders;
-- Count non-null values in a columnSELECT COUNT(email) FROM users;
-- Count distinct valuesSELECT COUNT(DISTINCT category) FROM products;COUNT(*) is highly optimized — ParticleDB can resolve it from metadata in O(1) when no filter is applied.
Sum of non-null values. Returns NULL if all inputs are NULL.
SELECT SUM(total) AS total_revenue FROM orders;
SELECT category, SUM(price * quantity) AS category_revenueFROM order_itemsGROUP BY category;Arithmetic mean of non-null values.
SELECT AVG(price) AS avg_price FROM products;
SELECT department, AVG(salary) AS avg_salaryFROM employeesGROUP BY departmentORDER BY avg_salary DESC;MIN / MAX
Section titled “MIN / MAX”Minimum and maximum values. Works on numeric, string, and timestamp types.
SELECT MIN(price) AS cheapest, MAX(price) AS most_expensive, MIN(created_at) AS first_order, MAX(created_at) AS latest_orderFROM orders;Statistical Aggregates
Section titled “Statistical Aggregates”STDDEV / STDDEV_POP
Section titled “STDDEV / STDDEV_POP”Population standard deviation. STDDEV is an alias for STDDEV_POP.
SELECT STDDEV(response_time) AS stddev_msFROM api_requestsWHERE endpoint = '/api/users';STDDEV_SAMP
Section titled “STDDEV_SAMP”Sample standard deviation (uses Bessel’s correction: divides by N-1).
SELECT endpoint, AVG(response_time) AS mean_ms, STDDEV_SAMP(response_time) AS stddev_msFROM api_requestsGROUP BY endpoint;VAR_POP / VARIANCE
Section titled “VAR_POP / VARIANCE”Population variance. VARIANCE is an alias for VAR_POP.
SELECT VAR_POP(score) AS score_varianceFROM exam_results;VAR_SAMP
Section titled “VAR_SAMP”Sample variance (Bessel’s correction).
SELECT subject, AVG(score) AS mean, VAR_SAMP(score) AS varianceFROM exam_resultsGROUP BY subject;MEDIAN
Section titled “MEDIAN”The middle value when sorted. For an even number of values, returns the average of the two middle values.
SELECT MEDIAN(response_time) AS median_msFROM api_requests;
SELECT endpoint, MEDIAN(response_time) AS p50, AVG(response_time) AS meanFROM api_requestsGROUP BY endpoint;The most frequently occurring value. If there are ties, returns one of the modes.
SELECT MODE(category) AS most_common_categoryFROM products;
SELECT region, MODE(payment_method) AS preferred_paymentFROM ordersGROUP BY region;Percentile Aggregates
Section titled “Percentile Aggregates”PERCENTILE_CONT
Section titled “PERCENTILE_CONT”Continuous percentile. Interpolates between adjacent values if the percentile falls between two data points. The argument is a fraction between 0 and 1.
-- 95th percentile response timeSELECT PERCENTILE_CONT(0.95) AS p95FROM api_requests;
SELECT endpoint, PERCENTILE_CONT(0.5) AS p50, PERCENTILE_CONT(0.95) AS p95, PERCENTILE_CONT(0.99) AS p99FROM api_requestsGROUP BY endpoint;PERCENTILE_DISC
Section titled “PERCENTILE_DISC”Discrete percentile. Returns an actual value from the dataset (the first value whose cumulative distribution exceeds the specified fraction).
SELECT PERCENTILE_DISC(0.5) AS median_scoreFROM exam_results;Approximate Aggregates
Section titled “Approximate Aggregates”APPROX_COUNT_DISTINCT
Section titled “APPROX_COUNT_DISTINCT”Approximate distinct count using HyperLogLog. Returns an estimate within ~2% error with much less memory than exact COUNT(DISTINCT ...). Useful for high-cardinality columns.
-- Approximate unique visitor countSELECT APPROX_COUNT_DISTINCT(user_id) AS approx_unique_usersFROM page_views;
-- Compare exact vs. approximateSELECT COUNT(DISTINCT user_id) AS exact, APPROX_COUNT_DISTINCT(user_id) AS approxFROM page_views;String Aggregates
Section titled “String Aggregates”STRING_AGG
Section titled “STRING_AGG”Concatenate values from multiple rows into a single string with a separator.
SELECT STRING_AGG(tag, ', ') AS all_tagsFROM article_tagsWHERE article_id = 42;-- Result: 'rust, database, performance'
SELECT department, STRING_AGG(name, '; ') AS team_membersFROM employeesGROUP BY department;GROUP_CONCAT
Section titled “GROUP_CONCAT”MySQL-compatible alias for string concatenation. Uses comma as the default separator.
SELECT order_id, GROUP_CONCAT(product) AS productsFROM order_itemsGROUP BY order_id;Collection Aggregates
Section titled “Collection Aggregates”ARRAY_AGG
Section titled “ARRAY_AGG”Collect values into an array.
Aliases: JSON_ARRAYAGG, JSONB_AGG, JSON_AGG — these all behave identically to ARRAY_AGG and return the collected values formatted as a JSON array.
SELECT customer_id, ARRAY_AGG(product) AS purchased_productsFROM order_itemsGROUP BY customer_id;
-- JSON_ARRAYAGG / JSONB_AGG / JSON_AGG are aliases for ARRAY_AGGSELECT department, JSON_AGG(name) AS team_membersFROM employeesGROUP BY department;Boolean Aggregates
Section titled “Boolean Aggregates”BOOL_AND
Section titled “BOOL_AND”Returns TRUE if all input values are TRUE. Returns NULL if all inputs are NULL.
-- Check if all items in an order have shippedSELECT order_id, BOOL_AND(is_shipped) AS fully_shippedFROM order_itemsGROUP BY order_id;BOOL_OR
Section titled “BOOL_OR”Returns TRUE if any input value is TRUE.
-- Check if any item in an order has an issueSELECT order_id, BOOL_OR(has_defect) AS has_any_defectFROM order_itemsGROUP BY order_id;Ordered-Set Aggregates
Section titled “Ordered-Set Aggregates”Returns the value associated with the earliest (minimum) timestamp. Takes two arguments: the value column and the ordering column.
-- First recorded status for each orderSELECT order_id, FIRST(status, updated_at) AS initial_statusFROM order_status_historyGROUP BY order_id;Returns the value associated with the latest (maximum) timestamp.
-- Most recent status for each orderSELECT order_id, LAST(status, updated_at) AS current_statusFROM order_status_historyGROUP BY order_id;ARG_MIN / MIN_BY
Section titled “ARG_MIN / MIN_BY”Returns the value of the first argument where the second argument is at its minimum.
Aliases: ARGMIN, MIN_BY
-- Cheapest product nameSELECT ARG_MIN(name, price) AS cheapest_product FROM products;
-- Per-category cheapest productSELECT category, MIN_BY(name, price) AS cheapest_product, MIN(price) AS lowest_priceFROM productsGROUP BY category;ARG_MAX / MAX_BY
Section titled “ARG_MAX / MAX_BY”Returns the value of the first argument where the second argument is at its maximum.
Aliases: ARGMAX, MAX_BY
-- Highest-scoring student's nameSELECT ARG_MAX(name, score) AS top_student FROM exam_results;
-- Per-department: who earns the most?SELECT department, MAX_BY(name, salary) AS highest_paid, MAX(salary) AS top_salaryFROM employeesGROUP BY department;ANY_VALUE
Section titled “ANY_VALUE”Returns an arbitrary non-null value from the group. Useful when you know all values in a group are the same, or when you just need any representative value.
SELECT customer_id, ANY_VALUE(customer_name) AS name, -- same for all rows in group SUM(total) AS total_spentFROM ordersGROUP BY customer_id;FILTER Clause
Section titled “FILTER Clause”Any aggregate function can include a FILTER (WHERE ...) clause to restrict which rows are included in the aggregation, without affecting other aggregates in the same query.
SELECT COUNT(*) AS total_orders, COUNT(*) FILTER (WHERE status = 'shipped') AS shipped, COUNT(*) FILTER (WHERE status = 'pending') AS pending, COUNT(*) FILTER (WHERE status = 'cancelled') AS cancelled, SUM(total) FILTER (WHERE status != 'cancelled') AS active_revenueFROM orders;This is equivalent to using CASE expressions but more readable:
-- Equivalent without FILTER (less readable)SELECT COUNT(*) AS total_orders, SUM(CASE WHEN status = 'shipped' THEN 1 ELSE 0 END) AS shippedFROM orders;DISTINCT Modifier
Section titled “DISTINCT Modifier”Apply DISTINCT inside an aggregate to operate only on unique values:
SELECT COUNT(category) AS total_entries, -- counts all non-null rows COUNT(DISTINCT category) AS unique_categories -- counts unique valuesFROM products;
SELECT STRING_AGG(DISTINCT category, ', ') AS categoriesFROM products;GROUP BY
Section titled “GROUP BY”Basic GROUP BY
Section titled “Basic GROUP BY”SELECT category, COUNT(*) AS num_products, AVG(price) AS avg_priceFROM productsGROUP BY category;Multiple Grouping Columns
Section titled “Multiple Grouping Columns”SELECT region, category, SUM(revenue) AS totalFROM salesGROUP BY region, categoryORDER BY total DESC;HAVING
Section titled “HAVING”Filter groups (as opposed to WHERE, which filters individual rows):
-- Categories with more than 10 productsSELECT category, COUNT(*) AS cntFROM productsGROUP BY categoryHAVING COUNT(*) > 10ORDER BY cnt DESC;
-- Customers who spent more than $1000SELECT customer_id, SUM(total) AS total_spentFROM ordersGROUP BY customer_idHAVING SUM(total) > 1000;GROUPING SETS
Section titled “GROUPING SETS”Compute aggregates for multiple grouping levels in a single query:
SELECT region, category, SUM(revenue)FROM salesGROUP BY GROUPING SETS ( (region, category), -- per region + category (region), -- per region subtotal () -- grand total);ROLLUP
Section titled “ROLLUP”Hierarchical subtotals — shorthand for common GROUPING SETS patterns:
-- Equivalent to GROUPING SETS ((region, category), (region), ())SELECT region, category, SUM(revenue)FROM salesGROUP BY ROLLUP(region, category);All possible combinations of grouping columns:
-- Equivalent to GROUPING SETS ((region, category), (region), (category), ())SELECT region, category, SUM(revenue)FROM salesGROUP BY CUBE(region, category);Performance Notes
Section titled “Performance Notes”ParticleDB uses multiple specialized aggregation engines, automatically selected based on query characteristics:
- Ungrouped aggregates (no GROUP BY): Resolved from precomputed zone-level statistics when possible (O(chunks) instead of O(rows)).
COUNT(*)resolves from metadata in O(1). - Low-cardinality groups (< ~8 groups): SIMD bitmask-based accumulation — scans each group in parallel using vectorized masks.
- Int64 GROUP BY with moderate cardinality: Dense-array direct accumulation — O(1) per row with no hashing.
- High-cardinality sorted data: Range-limited per-thread dense arrays that exploit data locality.
- High-cardinality unsorted: Radix-partitioned hash aggregation with parallel merge.
- Filter+aggregate fusion: Filter predicates and aggregation are evaluated in a single pass without materializing intermediate results.
All paths use Rayon-based parallelism across CPU cores.