Saturday, November 23, 2024
HomeKubernetesK8s TroubleshootingTroubleshooting Kubernetes Networking - Part1

Troubleshooting Kubernetes Networking – Part1

Troubleshooting Kubernetes Networking: As we are seeing one by one issues on Kubernetes and how to fix it, part of that today we are going to see, another important topic, which is nothing but networking. As this is vest topic to cover, we will be separate one by one and provide the solutions. In this topic, we will be taking some example scenarios and how to fix it.

In this series we should get understand the basics and few related topics, then will see very basic issue and how to fix in this, in coming post will be seeing another extended issue and how to fix it.

Basics

Services

Before we proceed, let’s check about Kubernetes service, A Kubernetes service is a logical abstraction for a deployed group of pods in a cluster (which all perform the same function). Since pods are ephemeral, a service enables a group of pods, which provide specific functions (web services, image processing, etc.) to be assigned a name and unique IP address (clusterIP). As long as the service is running that IP address, it will not change. Services also define policies for their access.

Types of Kubernetes services?

  • ClusterIP. Exposes a service which is only accessible from within the cluster.
  • NodePort. Exposes a service via a static port on each node’s IP.
  • LoadBalancer. Exposes the service via the cloud provider’s load balancer.
  • ExternalName. Maps a service to a predefined externalName field by returning a value for the CNAME record.

Container Network Interface

As we know, Every Pod in a cluster gets its own unique cluster-wide IP address. This means you do not need to explicitly create links between Pods and you almost never need to deal with mapping container ports to host ports. This creates a clean, backwards-compatible model where Pods can be treated much like VMs or physical hosts from the perspectives of port allocation, naming, service discovery, load balancing, application configuration, and migration.

Kubernetes imposes the following fundamental requirements on any networking implementation (barring any intentional network segmentation policies): pods can communicate with all other pods on any other node without NAT agents on a node (e.g. system daemons, kubelet) can communicate with all pods on that node.

Kubernetes networking addresses four concerns:

CNI Plugins:

Here are some familiar or most used CNI plugins,

  1. Flannel
  2. Calico
  3. Cilium
  4. WeaveNet
  5. Canal

Summary Matrix

FlannelCalicoCiliumWeavenetCanal
Mode of DeploymentDaemonSetDaemonSetDaemonSetDaemonSetDaemonSet
Encapsulation and RoutingVxLANIPinIP,BGP,eBPFVxLAN,eBPFVxLANVxLAN
Support for Network PoliciesNoYesYesYesYes
Datastore usedEtcdEtcdEtcdNoEtcd
EncryptionYesYesYesYesNo
Ingress SupportNoYesYesYesYes
Enterprise SupportNoYesNoYesNo

How to choose a CNI Provider?

There is no single CNI provider that meets all our project needs, here some details about each provider. For easy setup and configuration, Flannel and Weavenet provide great capabilities. Calico is better for performance since it uses an underlay network through BGP. Cilium utilizes a completely different application-layer filtering model through BPF and is more geared towards enterprise security.

Basic Troubleshooting

Traffic

As we have seen, Kubernetes supports a variety of networking plugins as seen above and each fails in its own way. To troubleshoot the issues in networking we should understand the core. Kubernetes relies on the Netfilter kernel module to set up low level cluster IP load balancing. This requires two critical modules, IP forwarding and bridging.

Kernel IP forwarding

IP forwarding is a kernel setting that allows forwarding of the traffic coming from one interface to be routed to another interface. This setting is necessary for Linux kernel to route traffic from containers to the outside world.

What it causes?

Sometimes this setting could be reset by a security team running while security scans/enforcements or some system changes, or have not been configured to survive a reboot. When this happens networking starts failing.

Pod to service connection times out:

* connect to 10.0.21.231 port 3000 failed: Connection timed out
* Failed to connect to 10.0.21.231 port 3000: Connection timed out
* Closing connection 0
curl: (7) Failed to connect to 10.0.21.231 port 3000: Connection timed out

Tcpdump could show that lots of repeated SYN packets are sent, but no ACK is received.

How to diagnose

Check that ipv4 forwarding is enabled

# sysctl net.ipv4.ip_forward

0 means that forwarding is disabled

net.ipv4.ip_forward = 0

How to fix

This will turn things back on a live server

# sysctl -w net.ipv4.ip_forward=1

on Centos/RHEL this will make the setting apply after reboot

# echo net.ipv4.ip_forward=1 >> /etc/sysconf.d/10-ipv4-forwarding-on.conf

Bridge-netfilter

The bridge-netfilter setting enables iptables rules to work on Linux bridges just like the ones set up by Docker and Kubernetes. This setting is necessary for the Linux kernel to be able to perform address translation in packets going to and from hosted containers.

What it causes?

Network requests to services outside the Pod network will start timing out with destination host unreachable or connection refused errors.

How to diagnose

Check that bridge netfilter is enabled

sysctl net.bridge.bridge-nf-call-iptables

0 means that bridging is disabled

net.bridge.bridge-nf-call-iptables = 0

How to fix

Note some distributions may have this compiled with kernel, check with

# cat /lib/modules/$(uname -r)/modules.builtin | grep netfilter
# modprobe br_netfilter

Turn the iptables setting on

# sysctl -w net.bridge.bridge-nf-call-iptables=1
# echo net.bridge.bridge-nf-call-iptables=1 >> /etc/sysconf.d/10-bridge-nf-call-iptables.conf

Firewall rules block overlay network traffic

Kubernetes provides a variety of networking plugins that enable its clustering features while providing backwards compatible support for traditional IP and port based applications.

One of most common on-premises Kubernetes networking setups leverages a VxLAN overlay network, where IP packets are encapsulated in UDP and sent over port 8472.

What it causes?

There is 100% packet loss between pod IPs either with lost packets or destination host unreachable.

$ ping 10.22.192.108
PING 10.22.192.108 (10.22.192.108): 56 data bytes
--- 10.22.192.108 ping statistics ---
5 packets transmitted, 0 packets received, 100% packet loss

How to diagnose

It is better to use the same protocol to transfer the data, as firewall rules can be protocol specific, e.g. could be blocking UDP traffic.

iperf could be a good tool for that:

on the server side

# iperf -s -p 5432 -u

on the client side

# iperf -c 10.22.192.108 -u -p 5432 -b 1K

How to fix

Update the firewall rule to stop blocking the traffic. Here is some common iptables advice.

Pod CIDR conflicts

Kubernetes sets up special overlay network for container-to-container communication. With isolated pod network, containers can get unique IPs and avoid port conflicts on a cluster. The problems arise when Pod network subnets start conflicting with host networks.

What it causes?

Pod to pod communication is disrupted with routing problems.

# curl http://172.24.28.32:3000
curl: (7) Failed to connect to 172.24.28.32 port 3000: No route to host

How to diagnose

Start with a quick look at the allocated pod IP addresses:

# kubectl get pods -o wide

Compare host IP range with the kubernetes subnets specified in the apiserver:

# ip addr list

IP address range could be specified in your CNI plugin or kubenet pod-cidr parameter.

How to fix

Double-check what RFC1918 private network subnets are in use in your network, VLAN or VPC and make certain that there is no overlap.

Once you detect the overlap, update the Pod CIDR to use a range that avoids the conflict.


You can follow us on social media, to get some short knowledges regularly.

RELATED ARTICLES
- Advertisment -

Most Popular

Recent Comments