Skip to content

Monitoring

ParticleDB exposes a Prometheus-compatible metrics endpoint and includes a pre-built Grafana dashboard. Metrics collection is designed for zero hot-path impact — counters use cache-line-aligned atomics (~1ns per observation) and histograms use thread-local shards with no cross-thread synchronization.

ParticleDB runs a dedicated HTTP server for metrics, separate from the main HTTP API so it can be independently firewalled.

Terminal window
# Default: port 9090
particledb start --metrics-port 9090

Scrape metrics at:

GET http://<host>:9090/metrics

The endpoint returns metrics in Prometheus text exposition format.

Add ParticleDB to your prometheus.yml:

scrape_configs:
- job_name: 'particledb'
scrape_interval: 15s
static_configs:
- targets: ['localhost:9090']

For Kubernetes with service discovery:

scrape_configs:
- job_name: 'particledb'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
regex: particledb
action: keep
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+)
replacement: ${1}:9090

These are refreshed on every /metrics scrape from the execution context.

MetricTypeDescription
particledb_tables_totalgaugeNumber of tables
particledb_rows_totalgaugeTotal rows across all tables
particledb_table_rows{table="..."}gaugeRows per table
particledb_uptime_secondsgaugeServer uptime in seconds

These are registered at server startup and updated on the hot path.

MetricTypeLabelsDescription
pdb_queries_totalcountertypeTotal queries by type (SELECT, INSERT, UPDATE, DELETE)
pdb_query_duration_secondshistogramQuery execution latency (100us to 30s buckets)
pdb_commit_duration_secondshistogramTransaction commit latency
pdb_wal_fsync_duration_secondshistogramWAL fsync latency (10us to 1s buckets)
pdb_connections_activegaugeprotocolActive connections by protocol (pg, redis, grpc, http)
pdb_memtable_bytesgaugeCurrent memtable size in bytes
pdb_sst_filesgaugelevelSST file count per LSM level
pdb_disk_usage_bytesgaugeTotal disk usage
pdb_memory_rss_bytesgaugeProcess RSS memory
pdb_uptime_secondsgaugeServer uptime
pdb_version_infogaugeversion, edition, os, archServer version metadata (value always 1)

The PG wire layer automatically maintains pg_active_connections in the global registry, which is included in every /metrics export.


ParticleDB runs a background collector that periodically gathers system and database metrics:

  • Collection interval: 15 seconds (configurable)
  • System metrics: CPU usage, memory usage, disk I/O
  • PDB metrics: active queries, cache sizes, table statistics, connection counts
  • Retention: 1000 snapshots (~4.2 hours at 15s intervals)

Historical metric snapshots are queryable via __pdb_stat_* virtual tables:

-- Recent CPU and memory usage
SELECT * FROM __pdb_stat_system ORDER BY timestamp DESC LIMIT 10;

ParticleDB exposes a health endpoint on the metrics server:

GET http://<host>:9090/health

Returns 200 OK when the server is healthy. Use this for Kubernetes liveness and readiness probes:

livenessProbe:
httpGet:
path: /health
port: 9090
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 9090
initialDelaySeconds: 5
periodSeconds: 5

ParticleDB ships with a pre-built Grafana dashboard covering:

  • Query Performance: QPS, latency p50/p95/p99, error rate, slow queries
  • Resource Utilization: CPU usage, memory usage, disk I/O, active connections
  • Storage Metrics: table sizes, zone map coverage, cache hit rate, compaction progress
  • Replication Health: Raft commit lag, leader elections, follower status
  • Alert Status: active alerts and alert history

Use the built-in dashboard manager to push the dashboard to Grafana:

Terminal window
pdb dashboard deploy --grafana-url http://localhost:3000 --api-key <GRAFANA_API_KEY>

Or programmatically:

use spanner_metrics::dashboard::{DashboardManager, DashboardConfig};
let config = DashboardConfig {
grafana_url: "http://localhost:3000".to_string(),
api_key: "glsa_xxx".to_string(),
..Default::default()
};
let manager = DashboardManager::new(config);
manager.provision_datasource("http://localhost:9090").await?;
manager.deploy_dashboard().await?;

The dashboard JSON template can also be imported directly into Grafana:

  1. Open Grafana and navigate to Dashboards > Import.
  2. Paste the dashboard JSON or upload the file.
  3. Select your Prometheus data source.
  4. Save.

The template expects a Prometheus data source pointed at ParticleDB’s metrics port (default 9090).


ParticleDB can forward metrics to external monitoring systems via a background exporter.

SET monitoring_integration = 'datadog';
SET datadog_agent_host = 'localhost:8125';
SET monitoring_tags = 'env:production,service:particledb';
SET monitoring_interval_seconds = '10';
SET monitoring_integration = 'newrelic';
-- Configure OTLP endpoint and API key via environment or config file
SET monitoring_integration = 'grafana_cloud';
-- Configure remote_write endpoint via environment or config file

ParticleDB tracks per-query statistics including fingerprinting, top-N tracking, and slow query logging.

-- Top 10 queries by total execution time
SELECT fingerprint, count, total_time, avg_time, max_time
FROM __pdb_query_stats
ORDER BY total_time DESC
LIMIT 10;

Slow queries are captured in a lock-free ring buffer and exposed via:

SELECT query_text, duration, rows_scanned, timestamp
FROM __pdb_slow_queries
ORDER BY timestamp DESC
LIMIT 20;

When --audit-log is enabled, all DDL and DML operations are recorded:

SELECT * FROM __pdb_audit_log
WHERE timestamp > NOW() - INTERVAL '1 hour'
ORDER BY timestamp DESC;

Add --audit-log-selects to also log SELECT queries (high volume).


For ad-hoc debugging, enable runtime diagnostics via environment variables:

VariableDescription
PDB_WIRE_TRACE=1Log every PG wire message with timing
PDB_WIRE_DUMP=1Dump raw wire bytes
PDB_REDIS_PROFILE=NLog Redis commands (sample every Nth)
PDB_OLTP_PROFILE=NLog OLTP operations (sample every Nth)
PDB_AGG_TELEMETRY=1Log aggregation strategy selection
Terminal window
PDB_WIRE_TRACE=1 PDB_AGG_TELEMETRY=1 particledb start

Recommended Prometheus alerting rules for production deployments:

groups:
- name: particledb
rules:
- alert: HighQueryLatency
expr: histogram_quantile(0.99, rate(pdb_query_duration_seconds_bucket[5m])) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "p99 query latency above 1s"
- alert: HighConnectionCount
expr: pdb_connections_active > 80
for: 2m
labels:
severity: warning
annotations:
summary: "Active connections nearing limit"
- alert: WALFsyncSlow
expr: histogram_quantile(0.99, rate(pdb_wal_fsync_duration_seconds_bucket[5m])) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "WAL fsync p99 above 100ms -- check disk health"
- alert: HighMemoryUsage
expr: pdb_memory_rss_bytes > 30e9
for: 5m
labels:
severity: warning
annotations:
summary: "RSS memory above 30GB"