Moving to Native Sidecars

Upgrading all our sidecar containers to Kubernetes Native Sidecars, including cloudsql-proxy and istio-proxy.

We recently upgraded our clusters to 1.29; which gave us access to Kubernetes Native Sidecar containers. A feature I've been waiting for, for quite literally years. I've recently moved Istio to them, as well as all of our cloudsql-proxy containers.

The Pain

Historically, we run our "Sidecars" as containers in the PodSpec. All of these containers are started in parallel, and shutdown in parallel. Typically, Sidecar processes need to be started before your application starts, and shut down after it stops, for example istio-proxy, or cloudsql-proxy, so you can immediately see the problem.

In normal deployments, the workarounds manifests as usually as a lot of scripts or utilities that you'd run as part of your application container init process, to effectively block it from starting until those dependencies were running. In our case, we had wait-for-istio and wait-for-cloudsql shimmed into the ENTRYPOINT for our application containers. On top of that, when the application was terminated, the SIGTERM would get sent to all containers in parallel, so could shut down your sidecars before your application was done gracefully exiting. Again, this would manifest as script wrappers that would do things like poll nestat to make sure there were no open connections from the app before exiting.

Another massive pain point was CronJob or Job instances. In these scenarios, the main application process would exit once complete, but the Sidecar would not. This means the Jobs would just sit there forever. Initially for us this meant wrapper script that caught the application exit and then hit the /quitquitquit endpoint on Istio, and later this became a Kubernetes watch that looked for Pod's in that terminating state and then killed their Sidecars.

This is all really leaky, applications shouldn't need to know about these sorts of data plane concerns, but we had no choice if we wanted to use the Sidecar pattern.

The Solution: Native Sidecars

Bit of an update here; I've discovered a bug in Kubernetes where in the rather niche situation you have more than one Sidecar, and one fails to start before the Pod is terminated, it will not receive a termination signal.

If you're on 1.29+ (not 1.28 - as there were some bugs in that alpha release of Sidecars), then you can use Sidecar containers. It's extremely simple, you:

  • Move your container from containers to initContainers
  • Set the restartPolicy: Always

Make sure that you set a startupProbe too, as your main containers won't start until all your initContainers are up, the startupProbe is what ensures the initContainer is ready, for example on cloudsql-proxy, for us, it looks like this:

startupProbe:
  failureThreshold: 60
  httpGet:
    path: /startup
    port: 9739
    scheme: HTTP
  periodSeconds: 1
  successThreshold: 1
  timeoutSeconds: 10

And that's it. The process will start before your app and shutdown after, and you can spend some time ripping out all the hacky stuff you've had to implement to work around it too!

If you're using Istio, enabling Sidecars is as simple as setting ENABLE_NATIVE_SIDECARS on pilot:

spec:
  components:
    pilot:
      k8s:
        env:
        - name: ENABLE_NATIVE_SIDECARS
          value: 'true'

John Howard talks a little more about it here. Or of course depending on your requirements you could look at Ambient Mesh where you don't need a sidecar at all.

Bit of an update here; I wouldn't do it with Istio just yet! There's a reasonably nasty bug I've stumbled across which means outbound connections from your app will start seeing Connection: close. This has pretty horrid interactions with some HTTP connection pools.

Gotchas

Other than the two updates above around Istio, and Multi-Sidecars, if like me, you use kube-state-metrics to capture kube_pod_container_resource_requests to monitor your Sidecars (for us, we use container_cpu_usage_seconds_total / kube_pod_container_resource_requests to calculate % cpu usage vs their request and alert when its > 100% for 30 minutes), then you'll notice that stops working when they are moved to initContainers.

It turns out for whatever reason init containers are in kube_pod_init_container_resource_requests, they have exactly the same format though. I do kind of wish init was just a label! Anyway, I had a lot of downstream recording rules etc that use this metric so the quick and easy win for me was to simply relabel it:

  - source_labels: [__name__]
    regex: 'kube_pod_init_container_resource_requests'
    target_label: __name__
    replacement: 'kube_pod_container_resource_requests'

Be careful with this approach however as it doesn't consider overlapping labels, for example if you had a container and an initContainer with the same name, you'd lose data because the last-in would win. We don't, so it was fine.