How to stabilize Calico's IP-in-IP tunnels in virtual environments

Estimated time to read: 5 minutes

When you work with bleeding edge technology you can expect the unexpected

As most of us know software is never without bugs and due to technical diversity, most of us won’t be able to fix these bugs ourselves. Instead, we can develop and deploy workarounds while we wait for specialists to release a fix.

In this post, I’d like to share how we resolved a connectivity issue between pods in our Kubernetes cluster due to a tunneling issue in Calico.

Public Cloud

At Fuga Cloud we run a public cloud based on the free and open-source cloud computing platform OpenStack. Initially, we built this public cloud for our users to set up and manage their own infrastructure and there has been an internal company demand for a similar service. We eat our own dog food.

Currently, we’re working on a continuous deployment pipeline to run OpenStack in containers. For fast iterations, we deploy to virtual hardware on our own public cloud. The containers are orchestrated by Kubernetes and intern-container connectivity is handled by Calico. Because we run on virtual hardware we use Calico’s IP in IP tunneling.

IP in IP is an IP tunnelling protocol that encapsulates one IP packet in another IP packet. (source wikipedia.org)

Problem

The problem we faced using Calico IP in IP tunnels in a virtual environment was that Kubernetes pods sometimes couldn’t connect to one another during the initialization phase. Somehow these IP-in-IP tunnels between pods weren’t properly initialized causing the pods to get stuck in a crash loop. During troubleshooting and many deployments runs we discovered sending ICMP packets from cross-origin pods within the Kubernetes cluster resolved the IP in IP network issues we were having.

The success rate of our continuous deployments went from 60% to 100%.

Workaround

Our current workaround is deploying a pod on each Kubernetes node which sends a single ICMP packet to each pod in the cluster. To deploy these pods we used some core features of Kubernetes narrowing the workaround down to a single configuration file containing no more than 30 lines.

Our resulting configuration after some iterations:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: pokepods
  namespace: kube-system
  labels:
    app: pokepods
spec:
  template:
    metadata:
      labels:
        app: pokepods
    spec:
      containers:
      - name: busybox
        image: busybox
        command: ["/bin/sh"]
        args:  ["-c", "PATH=$PATH:/host/usr/bin; while true; do kubectl get pods --all-namespaces -o go-template='range .itemsif (and (.status.podIP) (ne .metadata.namespace \"kube-system\"))ping -c 1 -w 1 .status.podIP || true;{{ end }}{{ end }}' |sh; sleep 5; done"]
        volumeMounts:
        - mountPath: /host/usr/bin
          name: kubectl-path
          readOnly: true
      volumes:
      - name: kubectl-path
        hostPath:
          path: /usr/bin

To deploy the pods in the Kubernetes cluster:

$ kubectl create -f ./manifest.yml

Break-it-down

Kubernetes configuration can be written in manifest files with a YAML or JSON format. The first three fields in the following excerpt are required for all Kubernetes configurations, apiVersion, kind, and metadata. The manifest complies with the defined API version, we want to deploy a DaemonSet resource in the kube-system namespace.

A DaemonSet ensures that all (or some) Nodes run a copy of a Pod (source: kubernetes.io)
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: pokepods
  namespace: kube-system
  labels:
    app: pokepods
The image we want to use as our container will be busybox. This image contains all the basic Unix tools we need to execute a shell script and ping other pods. The busybox image is pre-installed with Kubernetes which frees us from building and registering our own container image.
spec:
  template:
    metadata:
      labels:
        app: pokepods
    spec:
      containers:
      - name: busybox
        image: busybox
When all you have is a hammer

The busybox container runs a single process, configured in the command field. This process loops through all the running pods in the Kubernetes cluster and sends them an ICMP packet to jumpstart the IP in IP tunnel configured through Calico.

        command: ["/bin/sh"]
        args:  ["-c", "PATH=$PATH:/host/usr/bin; while true; do kubectl get pods --all-namespaces -o go-template='...' |sh; sleep 5; done"]

To know which pods to send ICMP packets we need to query the Kubernetes API with the Kubernetes client kubectl. We could install this client in our container but because it is pre-installed on all our Kubernetes nodes we simply mount the parent directory including the binary on the host in our container. An additional benefit is that the Kubernetes client will now always match the Kubernetes API.

        volumeMounts:
        - mountPath: /host/usr/bin
          name: kubectl-path
          readOnly: true
      volumes:
      - name: kubectl-path
        hostPath:
          path: /usr/bin

By default, the Kubernetes client kubectl returns output in a human-readable format and it supports machine-readable formats like YAML and JSON. Another option is to use the built-in templating system with which you can do insane things like build shell scripts. The following excerpt loops through the pods and filters pods which have an IP address and are not in the kube-system namespace.

    {{ no such element: type object['items'] }}
        if (and (.status.podIP) (ne .metadata.namespace "kube-system"))
            ping -c 1 -w 1 .status.podIP || true;
        {{ end }}
    {{ end }}

The output of the template is then piped to the sh command which executes the ping commands. Improvements

There are always improvements to be made but since this is a workaround I’ve left them out.

Filter-out pods which are in ready status
Log failed pings

FAQ

A quick way to hunt down the available fields you can use within templates is to view the accompanying resource in JSON format using the Kubernetes client:

$ kubectl -n kube-system get pod <pod id> -o json
{
    "apiVersion": "v1",
    "kind": "Pod",
    "status": {
        "podIP": "10.100.0.3",
...