Monitoring

Monitor Kyverno policy metrics with Prometheus

Introduction

As a cluster admistrator, it is beneficial for you to have capabilities to monitor the state and execution of the Kyverno policies applied over your cluster. Things like tracking the applied policies, the changes associated with them, the activity associated with the incoming requests processed, and the results associated with policies can prove to be extremely useful as a part of cluster observability and compliance.

In addition, providing flexible monitoring of targets from the rule level or policy level to entire cluster level gives you options to extract insights from the collected metrics.

Installation and Setup

When you install Kyverno via Helm, a service called kyverno-svc-metrics gets created inside the kyverno namespace and this service exposes metrics om port 8000.

 1$ values.yaml
 2
 3...
 4metricsService:
 5  create: true
 6  type: ClusterIP
 7  ## Kyverno's metrics server will be exposed at this port
 8  port: 8000
 9  ## The Node's port which will allow access Kyverno's metrics at the host level. Only used if service.type is NodePort.
10  nodePort:
11  ## Provide any additional annotations which may be required. This can be used to
12  ## set the LoadBalancer service type to internal only.
13  ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#internal-load-balancer
14  ##
15  annotations: {}
16...

By default, the service type is going to be ClusterIP meaning that the metrics would be only capable of being scraped by a Prometheus server sitting inside the cluster.

In many cases, the Prometheus server may be outside the workload cluster as an shared service. In those scenarios, you will want the kyverno-svc-metrics service to be publicly exposed so as to expose the metrics (available at port 8000) to your Prometheus server sitting outside the cluster.

Services can be exposed to external clients via an Ingress, or using LoadBalancer or NodePort service types.

To expose your kyverno-svc-metrics service publicly as NodePort at host’s/node’s port number 8000, you can configure your values.yaml before Helm installation as follows:

 1...
 2metricsService:
 3  create: true
 4  type: NodePort
 5  ## Kyverno's metrics server will be exposed at this port
 6  port: 8000
 7  ## The Node's port which will allow access Kyverno's metrics at the host level. Only used if service.type is NodePort.
 8  nodePort: 8000
 9  ## Provide any additional annotations which may be required. This can be used to
10  ## set the LoadBalancer service type to internal only.
11  ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#internal-load-balancer
12  ##
13  annotations: {}
14...

To expose the kyverno-svc-metrics service using a LoadBalancer type, you can configure your values.yaml before Helm installation as follows:

 1...
 2metricsService:
 3  create: true
 4  type: LoadBalancer
 5  ## Kyverno's metrics server will be exposed at this port
 6  port: 8000
 7  ## The Node's port which will allow access Kyverno's metrics at the host level. Only used if service.type is NodePort.
 8  nodePort: 
 9  ## Provide any additional annotations which may be required. This can be used to
10  ## set the LoadBalancer service type to internal only.
11  ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#internal-load-balancer
12  ##
13  annotations: {}
14...

Configuring the metrics

While installing Kyverno via Helm, you have also have the opportunity to tweak and configure the metrics which you need to be exposed.

  • You can configure only certain amount of namespaces for which you want the metrics to be exported by configuring which namespaces you want to be “included” or/and “excluded” while exposing the metrics. This configuration is useful in situations where you might want to exclude the exposure of Kyverno metrics for certain futile namespaces like test namespaces which you might be dealing with on a regular basis. At the same time, you can include certain namespaces if you want to monitor Kyverno-related activity for only a certain critical namespaces. Exporting only the right amount of namespaces as opposed to exposing all namespaces can end up substantially reducing the memory footprint of Kyverno’s metrics exporter.
 1...
 2config:
 3  metricsConfig:
 4    namespaces: {
 5      "include": [],
 6      "exclude": []
 7    }
 8  # 'namespaces.include': list of namespaces to capture metrics for. Default: all namespaces included.
 9  # 'namespaces.exclude': list of namespaces to NOT capture metrics for. Default: [], none of the namespaces excluded.
10...

“exclude” takes precedence over “include”, in case a namespace is provided both under “include” and “exclude”.

  • You can also configure a metrics refresh interval which cleans up the metrics registry and all the associated metrics of Kyverno’s metric exporter at a certain moment, hence, cleaning up and resetting the memory footprint associated with Kyverno’s metric exporter. This configuration is useful in situations where you might end up facing a periodic need of resetting and cleaning up the metric exporter of Kyverno to tone down the memory footprint associated with it.
    Although, Kyverno tries to minimise the cardinality associated with the metrics it exposes, yet it still exposes certain labels with a slightly more cardinality than other labels such as policy_name and resource_namespace. And in case of dealing with extremely huge number of namespaces/policies, the memory footprint of Kyverno’s metrics exporter might become heavy. Hence, this configuration would prove to be handy in those kinds of situations.
1...
2config:
3  # rate at which metrics should reset so as to clean up the memory footprint of kyverno metrics, if you might be expecting high memory footprint of Kyverno's metrics.
4  metricsRefreshInterval: 24h 
5  #Default: 0, no refresh of metrics
6...

You still would not lose your previous metrics as your metrics get persisted in your Prometheus backend.

Metrics and Dashboard


Policies and Rule Counts

This metric can be used to track the number of policies as well as rules present in the cluster which are currently active and even the ones which are not currently active but were created in the past.

Policy and Rule Execution

This metric can be used to track the results associated with the rules executing as a part of incoming resource requests and even background scans. This metric can be further aggregated to track policy-level results as well.

Policy Rule Execution Latency

This metric can be used to track the latencies associated with the execution/processing of the individual rules whenever they evaluate incoming resource requests or execute background scans. This metric can be further aggregated to present latencies at the policy-level.

Admission Review Latency

This metric can be used to track the end-to-end latencies associated with the entire individual admission review, corresponding to the incoming resource request triggering a bunch of policies and rules.

Admission Requests Counts

This metric can be used to track the number of admission requests which were triggered as a part of Kyverno.

Policy Change Counts

This metric can be used to track the history of all the Kyverno policies-related changes such as policy creations, updations and deletions.

Grafana Dashboard

A ready-to-use dashboard for Kyverno metrics.

Last modified September 16, 2021 at 6:51 AM PST: update docs to comply with latest metrics-related changes (d17d6a3)