As we were seeing some debug options in kubernetes in last two articles, in this let’s speak about one of the important key issues in kubernetes related to troubleshooting an issue. Kubernetes has lack of built-in observability tool which seen as one significant downside of it. As we aware log and event metrics are essential in troubleshooting or managing the resources or services.
In Kubernetes, we have to use some third-party tools to solve this issue, in this section lets check about some opensource tools to watch kubernetes events.
Key Problem
As we know applications running on Kubernetes cluster are dynamic in nature. Which means pods, replicas, deployments in your cluster keeps going on and off over the period due to their ephemeral nature. It is important that we should check what happened and why the resources go on and off, there could be various reasons, few could be.
- to debug historic incidents
- to debug common tasks like:
- finding info and events related to Kubernetes resources (like pods, replicasets, deployments, etc) that have been deleted, like.
- finding info related to pods/replicasets that are replaced by newer pods/replicasets after a deployment update
- getting details of pods evicted from a lost node
- getting the nodes availability and any lost node details.
- knowing rollout details of older deployments
- discovering hosts where pods a from previous deployment were running
- retrieving timings of pod replacements and their health checks
- long term behavioral analysis of your workloads running on your Kubernetes cluster
- and so on…
Simple word, we should have the all the information about the events happening in your Kubernetes cluster.
About Kubernetes Events
Kubernetes events show what is happening in a cluster when there is a state change or error from other resources in the system. It offers you information regarding changes, such as why the system cannot pull the docker image or why some pods were evicted from the cluster. Events are resource types created automatically by all core components and extensions in a cluster through the API Server.
Even though it provides by default, it has various limitations,
- Kubernetes Events can generally only be accessed using kubectl
- The default retention period of kubernetes events is 1 hour.
- The retention period can be increased using –event-ttl flag of kube-apiserver. But doing so can cause issues with the cluster’s key-value store.
- There is no way to visualize these events.
Accessing Kubernetes Events
As we seen in last section, kubernetes does not have built-in support to access, store, or forward events over a long time. It retains it for a short time and is cleaned afterward. If need you can integrate with any logging tool and watch it or you should access only via kubectl.
Running the kubectl describe
command on specific cluster resources will list the events for that resource. A more generic way of doing this is by running the kubectl get events
command, which lists the specific resources’ events or the entire cluster.To collect or watch the events, you can run kubectl get events --watch
in deployment.
To address a few of the problems mentioned above, tools like Kubewatch, Eventrouter and Event-exporter have been developed.
Kubewatch – kubewatch is a Kubernetes watcher that currently publishes notification to available collaboration hubs/notification channels. Run it in your k8s cluster, and you will get event notifications through webhooks.
Eventrouter – a simple event router for the Kubernetes project. The event router serves as an active watcher of event resource in the kubernetes system, which takes those events and pushes them to a user specified sink. This is useful for a number of different purposes, but most notably long-term behavioral analysis of your workloads running on your kubernetes cluster.
Event-exporter – Kubernetes events to Prometheus bridge. A Collector that can list and watch Kubernetes events, and according to events’ occurrence, determine how long the event lasts. The information is then translated into metrics, you can then either create alerts using Alertmanager or create visualization dashboards using Grafana based on these collected events.
The tools mentioned above are a good way to handle most of the challenges posed by Kubernetes events. But these are not a standalone solution, you have lot of work to do as an end user. You also need to configure other tools apart from these ones to store and visualize the events.
Sloop: Your Ultimate and Easy Solution
Sloop is an independent solution that monitors, stores, and visualizes events and changes in Kubernetes resources over time. It is designed to provide a timeline of updates made to existing resources and resources that no longer exist in the cluster.
The visual dashboard also allows for easy inspection of event metrics for debugging and error handling purposes.
Key Features
- Allows you to find and inspect resources that no longer exist in your kubernetes cluster
- Helps in answering almost all the queries mentioned at the beginning of this blog
- Provides a timeline display that shows rollouts of related resources in updates to Deployments, ReplicaSets and StatefulSets
- Helps in debugging transient and intermittent errors
- Allows you to see changes over time in a Kubernetes application
- Is a self-contained service with no dependencies on distributed storage
Architecture
Installation
Sloop can be installed using helm or Precompiled Binaries or Build from Source.
All methods will require you to have a kubernetes cluster running, and the KUBECONFIG environment variable set up.
# git clone https://github.com/salesforce/sloop
# cd sloop/helm
# kubectl create ns sloop
# helm install sloop -n sloop ./sloop
Docker container
Refer to this document to run Sloop as a standalone docker container.
then use kubectl’s port-forward function to access the dashboard:
kubectl port-forward -n sloop service/sloop 8080:80
and visit http://localhost:8080/ to view the dashboard.
As you can see sloop provides timeline of your kubernetes resources. It also provides different filters to visualize it.
With Sloop, you can filter out Kubernetes resources based on the time range, the Kubernetes namespace, the kind of resource (like pods, pvc, node, etc), the resource name and sort events based on different options. Selecting a particular Kubernetes resource in a specified timeline will show you different events occurring at that moment on that resource. This helps in capturing all past events that happened on that resource in your cluster.
For more information, check out the Sloop project on GitHub.
Check also,
How to debug a Kubernetes service deployment – FoxuTech
Common Kubernetes troubleshooting tasks – FoxuTech