¶

auto direct node routes cilium monitor to trace flows

Kubernetes, Cilium, and Network Connectivity¶

Estimated time to read: 11 minutes

Originally Written: April, 2025

Overview¶

This post is documenting a few details about Kubernetes networking, the Cilium CNI, default settings vs tuning for performance, and an example of how to connect Kubernetes to Cisco ACI.

Kubernetes Networking

For more general information on Kubernetes networking please have a look at the following link.

https://tl10k.dev/categories/kubernetes/kubernetes-networking-intro/part-1/

We first need access to the Kubernetes cluster before the network can be configured. This can be achieved by connecting via the kubectl CLI tool.

If you're completely new to Kubernetes you might also want to read through the following guides.

There will be a few example configurations provided as YAML. The Kubernetes configuration uses a standard format.

The ACI configuration uses the Network as Code project which makes it easy to deploy a network through a YAML file.

Kubernetes Networking and the CNI¶

This post will look at the two main components of Kubernetes Networking; the CNI and the Kubernetes Service. First we will look at the defaults and then see how to increase performance, visibility, and security.

Kubernetes is an orchestration platform used to automate the deployment, scaling, and management of containerized applications. However, it does not manage all of this by itself. Instead Kubernetes "outsources" some functions such as deploying the containers themselves, some networking, and storage configuration, to third party applications.

For example, Kubernetes makes a request to the container runtimes such as containerd or CRI-O which is the actual software that start/stop/delete the containers. This is referred to as a Container Runtime Interface or CRI.

Similarly, Kubernetes uses a third party plugin such as Calico or Cilium to implement some of the networking in a Kubernetes cluster. This is known as a Container Networking Interface or CNI.

The CNI (e.g. Cilium) is responsible for assigning an IP address to each pod or Kubernetes Service, creating interfaces on the Kubernetes nodes/pods, and setting up routing/tunnel to ensure all pods within a cluster can communicate. Some CNIs provide additional functionality. For example Cilium is also able to assign IP addresses to Services of type LoadBalancer rather than having to rely on another software such as MetalLB.

The CNI typically runs as a pod on each Kubernetes node. In the example of Cilium, the cilium-agent process runs in the cilium pod on each node to manage the eBPF programs used to provide network connectivity for the cluster.

Image reference: Cilium Component Overview

Each node receives a PodCIDR block and the CNI programs each new pod with an IP from this block. You can use the following command to find the pod CIDR assigned to each node.

kubectl get nodes -o custom-columns=NODE: .metadata.name,PODCIDR: .spec.podCIDR

The routes and interfaces are also configured by the CNI.

The default settings¶

All the CNIs I've worked with have set the default configurations to provide the most compatibility rather than the highest performance. In that case how could one pod talk to a pod on a different node?

The easiest option is to let the CNI setup the connectivity for you, rather than you having to configure routing to the upstream network and advertise your networks. Cilium, Flannel, and Weave use VXLAN tunnels to encapsulate traffic on the node and transport it across a network to the destination node. Calico can use either IPinIP or VXLAN encapsulation.

By encapsulating the traffic within the Kubernetes cluster, the upstream network only needs to understand how to reach the destination node, not how to reach the service or pod network. This provides a simple solution to connect the nodes and pods together. However there's a tradeoff in that you lose visibility in the network (as traffic is encapsulated) and aren't able to apply as granular security policies. There's also a slight performance overhead which can be an issue in some environments.

The second half of the post will look at how to tune the environment to address these tradeoffs.

Should I care about pod IPs?¶

In a Kubernetes cluster you may have more than a single replica for a workload e.g. I might have 10 pods serving my application and when I need to scale up for performance I just add another 10. In that case to which pod IP address should I connect?

Kubernetes Services abstract away this problem and help you to track and manage connectivity to your pods. They are a native concept to Kubernetes, meaning they do not rely on an external plugin as we saw with pod communication.

There are a few key benefits that Kubernetes Services provide.

Tracking pods
Providing access within the cluster
Providing access from outside the cluster

Tracking pods¶

Labels and selectors are very important concepts in Kubernetes.

Labels are key/value pairs that are attached to objects, such as pods [and] are intended to be used to specify identifying attributes of objects that are meaningful and relevant to users. Unlike names and UIDs, labels do not provide uniqueness. In general, we expect many objects to carry the same label(s)

Via a label selector, the client/user can identify a set of objects

https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/

In this case we can create a new service and track any pods with a specific label (app: hello-node in the following example). Kubernetes will then maintain a reference of the pods (endpoints) including the pod IP.

Internal and external connectivity¶

Each Kubernetes service will be assigned an internal ClusterIP and a DNS record will be configured (the general format is <service-name>.<namespace>.svc.cluster.local but this may change depending on how your K8s cluster is configured)

You can then access this ClusterIP or DNS record from any pod within the cluster, rather than having to know the IPs of the individual pods.

Additionally, Kubernetes Services can be used to provide access to pods from an external client (e.g. another application or a user). See Part 3 of the Kubernetes Networking post for more details on the available options. This post will show Cilium load balancing as an option in a later section.

But what are Kubernetes Services and what creates them? You may have seen a kube-proxy pod running on each of your nodes when you deploy a cluster and it's this component which implements services. When a service is created or changed, the kube-proxy pod will configure IPTables rules on each of the Kubernetes nodes. These rules direct traffic hitting the ClusterIP to one of the pods with the specified label.

As you see in the screenshot below, there is an element of randomness when selecting to which pod the traffic should be sent. The iptables statistics module is used to allow traffic to be randomly selected based on a specified probability.

This random selection is performed on each node using the kernels pseudo random number generator and therefore there is no consistency of which pod will be selected. i.e. if traffic is sent from a client to two nodes they may be redirected to two different backend pods. In some environments it may be required that a client connection is always sent to the same backend pod, regardless of the ingress node. This is possible with consistent hashing which is covered in a later section.

There are four pods associated with the hello-node service in the following screenshot. You can also see four rules (last four lines).

Initial Rule (0.25 probability)
- When there are four pods, the traffic needs to be evenly split between them
- The statistic module matching is based on a uniform distribution, where the probability of a packet being selected is defined by the --probabilityparameter. It is a value between 0 and 1. For example, --probability 0.5 means each packet has a 50% chance of being matched
- The first rule is set with a --probability 0.25 to capture 25% of the traffic and send it to the first pod
- When traversing the IPTables rules, if this rule is matched based on the random selection it's sent to the associated endpoint (KUBE-SEP)
- If it's not selected it proceeds to the second rule
Second Rule (0.33 probability)
- If the previous rules matched the traffic would have been sent to the first pod
- Since we are now at the second rule it means it didn't match and we therefore need to select one of the remaining three pods`
- Therefore in this rule the probability is split among three pods or --probability 0.33
- Like the previous, if this rule is selected the packet is sent to the associated endpoint (KUBE-SEP)
- If not it proceeds to the third rule
Third Rule (0.5 probability)
- There are now two remaining pods or --probability 0.5
- If it's selected go to the KUBE-SEP endpoint rule
- Otherwise go to the final rule
Final Rule (No probability)
- If we get to this point then the first three rules were not selected
- This means there's only one more pod and we don't need a probability rule
- This is because any traffic that doesn't match earlier rules will fall through to the next rule or destination, making the fourth pod the default recipient of the remaining traffic

You may also have seen that masquerading or Network Address Translation (NAT) is used to send traffic to the kubernetes endpoint. In this case the destination IP of the packet is changed from the service IP to the Pod IP.

Tuning for performance, visibility, and security¶

The second half of the post will cover

¶