Loki Metrics Reference
This reference document describes the metrics that are automatically derived from logs collected by Loki in the NAIS platform.
loki:service:loglevel:count1m¶
This metric represents the count of logs aggregated over a 1-minute window, categorized by service and log level.
Description¶
The loki:service:loglevel:count1m metric provides a pre-aggregated count of log entries for each 1-minute interval, grouped by service, namespace, cluster, and log level. This metric is particularly useful for:
- Monitoring the volume of logs at different severity levels
- Setting up alerts for unusual increases in error or warning logs
- Creating dashboards to visualize logging patterns across services
- Identifying services with excessive logging
Labels¶
| Label | Description | Example Values | 
|---|---|---|
| service_name | The name of the service or application that generated the logs | my-app,user-api | 
| service_namespace | The Kubernetes namespace where the service is running | team-a,default | 
| k8s_cluster_name | The name of the Kubernetes cluster | dev-gcp,prod-gcp | 
| detected_level | The log level or severity of the log entries | error,warn,info,debug,trace | 
Usage Examples¶
Prometheus Query Examples¶
Count of error logs for a specific service in the last hour:
sum(loki:service:loglevel:count1m{service_name="my-app", service_namespace="my-team", detected_level="error"}[60m:1m])
Ratio of errors to total logs for all services in a namespace:
sum(loki:service:loglevel:count1m{service_namespace="team-a", service_namespace="my-team", detected_level="error"}) /
sum(loki:service:loglevel:count1m{service_namespace="team-a", service_namespace="my-team"})
Alert Examples¶
Alert on high number of error logs:
Best Practices¶
- Do not use increase()orrate()functions with this metric, as it is already pre-aggregated for 1-minute intervals
- For longer time ranges, use range vector selectors like [60m:1m]to sample at 1-minute intervals
- Consider setting appropriate thresholds for alerts based on your application's normal logging behavior
- Combine with other metrics (like HTTP status codes) for more comprehensive service health monitoring