Monitoring Kubernetes Clusters

Retya Mahendra
5 min readNov 27, 2020

Monitoring is vital whether it is web application databases or kubernetes clusters. It’s about knowing how’s and what’s of application or cluster performance.

Today let’s take a look at how we can setup Prometheus, Grafana and Kube-State-Metrics to monitor Kubernetes cluster and other resources running on the cluster. Let’s start with some introductions

Prometheus

Prometheus is free and an open-source event monitoring tool for containers or microservices. Prometheus collects numerical data based on time series. The Prometheus server works on the principle of scraping. This invokes the metric endpoint of the various nodes that have been configured to monitor. These metrics are collected in regular timestamps and stored locally. The endpoint that was used to discard is exposed on the node.

All your applications are now equipped to provide data to Prometheus. We still need to inform Prometheus where to look for that data. Prometheus discovers targets to scrape from by using Service Discovery. Your Kubernetes cluster already has labels and annotations and an excellent mechanism for keeping track of changes and the status of its elements. Hence, Prometheus uses the Kubernetes API to discover targets.

The Kubernetes service discoveries that you can expose to Prometheus are:

  • node
  • endpoint
  • service
  • pod
  • ingress

Prometheus retrieves machine-level metrics separately from the application information. The only way to expose memory, disk space, CPU usage, and bandwidth metrics is to use a node exporter. Additionally, metrics about cgroups need to be exposed as well.

Grafana

Grafana is a multi-platform open source analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources, Grafana Enterprise version with additional capabilities is also available. It is expandable through a plug-in system

rafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data driven culture:

  • Visualize: Fast and flexible client side graphs with a multitude of options. Panel plugins offer many different ways to visualize metrics and logs.
  • Dynamic Dashboards: Create dynamic & reusable dashboards with template variables that appear as dropdowns at the top of the dashboard.
  • Explore Metrics: Explore your data through ad-hoc queries and dynamic drilldown. Split view and compare different time ranges, queries and data sources side by side.
  • Explore Logs: Experience the magic of switching from metrics to logs with preserved label filters. Quickly search through all your logs or streaming them live.
  • Alerting: Visually define alert rules for your most important metrics. Grafana will continuously evaluate and send notifications to systems like Slack, PagerDuty, VictorOps, OpsGenie.
  • Mixed Data Sources: Mix different data sources in the same graph! You can specify a data source on a per-query basis. This works for even custom datasources.

Kube-State-Metrics

Kube-state-metrics is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects. (See examples in the Metrics section below.) It is not focused on the health of the individual Kubernetes components, but rather on the health of the various objects inside, such as deployments, nodes and pods.

Kube-state-metrics is about generating metrics from Kubernetes API objects without modification. This ensures that features provided by kube-state-metrics have the same grade of stability as the Kubernetes API objects themselves. In turn, this means that kube-state-metrics in certain situations may not show the exact same values as kubectl, as kubectl applies certain heuristics to display comprehensible messages. kube-state-metrics exposes raw data unmodified from the Kubernetes API, this way users have all the data they require and perform heuristics as they see fit.

Configuration :

Let’s install helm v3 using following commands.

$ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
$ chmod 700 get_helm.sh
$ ./get_helm.sh
##check helm version
$ sudo helm version

Add repository of stable charts. Here, we are adding stable helm charts default repository, so we can search & install stable chart from it.

$ helm repo add stable https://charts.helm.sh/stable
$ helm search repo stable/prometheus

you will see something similar like this:

NAME                                  CHART VERSION APP VERSION DESCRIPTION                                       
stable/prometheus 11.12.1 2.20.1 DEPRECATED Prometheus is a monitoring system an...
stable/prometheus-adapter 2.5.1 v0.7.0 DEPRECATED A Helm chart for k8s prometheus adapter
stable/prometheus-blackbox-exporter 4.3.1 0.16.0 DEPRECATED A Helm chart for prometheus-nats-exp...
stable/prometheus-node-exporter 1.11.2 1.0.1 DEPRECATED A Helm chart for prometheus node-exp...
stable/prometheus-operator 9.3.2 0.38.1
......

Now create a namespace at kubernetes cluster

$ kubectl create namespace monitoring

Here, we will install the Prometheus Operator for Kubernetes that provides easy monitoring definitions for Kubernetes services and deployment and management of Prometheus instances.

$ helm install prometheus2 stable/prometheus-operator --namespace monitoring

Once the chart is installed, you can check the with following commands

$ helm list
$ kubectl get pods -n monitoring
$ kubectl get svc -n monitoring

Result will look like:

#pods
NAME READY STATUS RESTARTS AGE
alertmanager-prometheus2-prometheus-ope-alertmanager-0 2/2 Running 0 62m
prometheus-prometheus2-prometheus-ope-prometheus-0 3/3 Running 1 61m
prometheus2-grafana-757dbf44bd-7px6x 2/2 Running 0 62m
prometheus2-kube-state-metrics-5b8b7dd64d-lmfc8 1/1 Running 0 62m
prometheus2-prometheus-node-exporter-45cbt 1/1 Running 0 62m
prometheus2-prometheus-node-exporter-5vwt7 1/1 Running 0 62m
prometheus2-prometheus-ope-operator-79d66779d5-fsxxx 1/1 Running 0 62m
#service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 62m
prometheus-operated ClusterIP None <none> 9090/TCP 62m
prometheus2-grafana ClusterIP 10.110.130.7 <pending> 80/TCP 62m
prometheus2-kube-state-metrics ClusterIP 10.105.179.166 <none> 8080/TCP 62m
prometheus2-prometheus-node-exporter ClusterIP 10.109.81.207 <none> 9100/TCP 62m
prometheus2-prometheus-ope-alertmanager ClusterIP 10.108.140.60 <none> 9093/TCP 62m
prometheus2-prometheus-ope-operator ClusterIP 10.111.172.32 <none> 8080/TCP 62m
prometheus2-prometheus-ope-prometheus ClusterIP 10.98.122.179 <none> 9090/TCP 62m

Now change type service of Prometheus and Grafana :

$ kubectl -n monitoring edit svc/prometheus2-prometheus-ope-prometheus
$ kubectl -n monitoring edit svc/prometheus2-grafana
# change type ClusterIP to NodePort

To access Grafana using http://node_ip:node_port

You will get the Grafana dashboard username and password from getting a secret from prometheus-grafana.

$ kubectl get secret --namespace monitoring prometheus2-grafana -o yaml

Result will look like:

apiVersion: v1
data:
admin-password: cHJvbS1vcGVyYXRvcg==
admin-user: YWRtaW4=
ldap-toml: ""
kind: Secret
type: Opaque
.....

Secret data encoded with base64 you can decode it by the following command:

$ openssl base64 -d YWRtaW4=
admin
$ openssl base64 -d cHJvbS1vcGVyYXRvcg==
prom-operator

So, use that user and password to login Grafana. And Grafana also provide several templates to Monitoring Kubernetes

Conclusion

Monitor System is very important as it is a key component in Kubernetes control plane. Remember that Kubernetes cluster is responsible for having the correct number of elements in all of the deployments, daemonsets, persistent volume claims and many other kubernetes elements.

An issue in Kubernetes can compromise scalability and resilience of the applications running in the cluster. Monitoring system manager can allow you to avoid these complications that would be hard to detect otherwise. So, don’t forget to monitor your control plane!

--

--