Loki Metrics Reference
This reference document describes the metrics that are automatically derived from logs collected by Loki in the NAIS platform.
loki:service:loglevel:count1m
¶
This metric represents the count of logs aggregated over a 1-minute window, categorized by service and log level.
Description¶
The loki:service:loglevel:count1m
metric provides a pre-aggregated count of log entries for each 1-minute interval, grouped by service, namespace, cluster, and log level. This metric is particularly useful for:
- Monitoring the volume of logs at different severity levels
- Setting up alerts for unusual increases in error or warning logs
- Creating dashboards to visualize logging patterns across services
- Identifying services with excessive logging
Labels¶
Label | Description | Example Values |
---|---|---|
service_name |
The name of the service or application that generated the logs | my-app , user-api |
service_namespace |
The Kubernetes namespace where the service is running | team-a , default |
k8s_cluster_name |
The name of the Kubernetes cluster | dev-gcp , prod-gcp |
detected_level |
The log level or severity of the log entries | error , warn , info , debug , trace |
Usage Examples¶
Prometheus Query Examples¶
Count of error logs for a specific service in the last hour:
Ratio of errors to total logs for all services in a namespace:
sum(loki:service:loglevel:count1m{service_namespace="team-a", detected_level="error"}) /
sum(loki:service:loglevel:count1m{service_namespace="team-a"})
Alert Examples¶
Alert on high number of error logs:
Best Practices¶
- Do not use
increase()
orrate()
functions with this metric, as it is already pre-aggregated for 1-minute intervals - For longer time ranges, use range vector selectors like
[60m:1m]
to sample at 1-minute intervals - Consider setting appropriate thresholds for alerts based on your application's normal logging behavior
- Combine with other metrics (like HTTP status codes) for more comprehensive service health monitoring