Observability — metrics + logs + alerting. Replaces Datadog / New Relic / SignalFx. PMA bundles Grafana + Prometheus (metrics) + Loki (logs) as the observability stack. Grafana is the UI; Prometheus + Loki are the data stores.
| Field | Value |
|---|---|
| Container | ${CONTAINER_PREFIX}grafana |
| Image | grafana/grafana |
| Internal port | 3000 |
| External port | ${GRAFANA_PORT} (default 8086) |
| Database | SQLite by default (or Postgres for HA) |
| Storage | grafana-data volume |
| Backup type | volume |
| Classification | enterprise |
| Profiles | data, enterprise, full |
| Default SSO | oidc |
| Companion services | Prometheus, Loki, Promtail |
PMA's bundled dashboards cover: per-service container health, request rate / latency, DB connection pool usage, disk usage, tunnel uptime.
| Recipe | What it does |
|---|---|
just grafana-status / -logs / -restart |
Lifecycle |
just grafana-list-users |
List users |
just grafana-create-user --email=… |
Create user |
just grafana-dashboards-export |
Export current dashboards as JSON (for git) |
just grafana-dashboards-import |
Import dashboards from packages/grafana/dashboards/ |
Per-companion-service recipes:
| Recipe | What it does |
|---|---|
just prometheus-status / -logs |
Prometheus lifecycle |
just loki-status / -logs |
Loki lifecycle |
just promtail-status / -logs |
Log shipper lifecycle |
just prometheus-targets |
Show which services Prometheus is scraping |
Grafana → Alerting → Notification policies → + Contact point
- Type: webhook
- URL: ${MATTERMOST_INCOMING_WEBHOOK_URL}
PMA ships an n8n workflow that wraps Mattermost incoming webhooks so the alert
formatting is consistent (per-alert channel, mention oncall, etc.).
Grafana → Dashboards → New
→ Add panel → Query → Prometheus → write PromQL
→ Visualisation → Line / Bar / Gauge / Stat
→ Save
Export the dashboard JSON + commit to packages/grafana/dashboards/ so it
deploys with every fresh install.
Grafana → Explore → Loki
{service="redmine"} |= "ERROR" # all error lines from redmine
{service=~"redmine|mattermost"} |~ "(?i)oauth" # OAuth-related lines from either
rate({service="redmine"}[5m]) # log lines per second
just backup grafana # tars the grafana-data volume
just restore grafana latest
For Prometheus + Loki data, the volumes are large (gigabytes). The backup
captures them but consider whether you actually need historical metrics
restored — for most outages, fresh metrics work fine.
sso.type: oidcjust sso-check grafana
just sso-fix grafana
Grafana reads OIDC config from env vars set by bootstrap. Authentik groups
map to Grafana roles via the role_attribute_path config:
# In grafana.ini (set via env vars in PMA bootstrap)
role_attribute_path = contains(groups[*], 'admins') && 'Admin' || 'Viewer'
Grafana, Prometheus, Loki are three services. They're listed under one PMA package but spawn three sets of containers. Manage individually via just grafana-* / just prometheus-* / just loki-*.
Prometheus scrape config is static. Targets are listed in packages/grafana/prometheus.yml. Adding a new service to monitor: add a scrape config + restart Prometheus. PMA's generic services expose a /metrics endpoint where available.
Loki indexes are time-bucketed. Querying "all errors ever" is expensive — always include a time range. The default 1-hour range is sensible.
Promtail config lives in packages/grafana/promtail.yml. It tails Docker container logs (via /var/lib/docker/containers/) and ships them to Loki with labels.
Dashboard provisioning vs UI edits. Dashboards provisioned from packages/grafana/dashboards/ get "Provisioned" markers + can't be deleted via UI. UI edits to provisioned dashboards are lost on restart unless re-exported + committed.
Default admin user. Bootstrap creates admin with a password in .env (GRAFANA_ADMIN_PASSWORD). DO NOT delete this — recovery account. Rotate periodically.
| Symptom | First check |
|---|---|
| Dashboards show "no data" | Prometheus not scraping or query window is wrong; check Prometheus targets |
| Logs not showing in Explore | Promtail not shipping; just promtail-logs |
| Alerts not firing | Notification policy + contact point config; test via "Test" button |
| Login fails after Authentik change | just sso-check grafana |
| Grafana UI very slow | Long time-range query against Prometheus; reduce range or add recording rules |
/pma/services/superset — analytics on operational data (vs Grafana for metrics + logs)./pma/reference/cli/sso — SSO management.