Moving to Native Sidecars
Upgrading all our sidecar containers to Kubernetes Native Sidecars, including cloudsql-proxy and istio-proxy.
We recently upgraded our clusters to 1.29; which gave us access to Kubernetes Native Sidecar containers. A feature I've been waiting for, for quite literally years. I've recently moved Istio
to them, as well as all of our cloudsql-proxy
containers.
The Pain
Historically, we run our "Sidecars" as containers
in the PodSpec
. All of these containers are started in parallel, and shutdown in parallel. Typically, Sidecar processes need to be started before
your application starts, and shut down after
it stops, for example istio-proxy
, or cloudsql-proxy
, so you can immediately see the problem.
In normal deployments, the workarounds manifests as usually as a lot of scripts or utilities that you'd run as part of your application container init process, to effectively block it from starting until those dependencies were running. In our case, we had wait-for-istio
and wait-for-cloudsql
shimmed into the ENTRYPOINT
for our application containers. On top of that, when the application was terminated, the SIGTERM
would get sent to all containers in parallel, so could shut down your sidecars before your application was done gracefully exiting. Again, this would manifest as script wrappers that would do things like poll nestat
to make sure there were no open connections from the app before exiting.
Another massive pain point was CronJob
or Job
instances. In these scenarios, the main application process would exit once complete, but the Sidecar would not. This means the Jobs would just sit there forever. Initially for us this meant wrapper script that caught the application exit and then hit the /quitquitquit
endpoint on Istio
, and later this became a Kubernetes watch that looked for Pod's
in that terminating state and then killed their Sidecars.
This is all really leaky, applications shouldn't need to know about these sorts of data plane concerns, but we had no choice if we wanted to use the Sidecar
pattern.
The Solution: Native Sidecars
Bit of an update here; I've discovered a bug in Kubernetes where in the rather niche situation you have more than one Sidecar, and one fails to start before the Pod is terminated, it will not receive a termination signal.
If you're on 1.29+ (not 1.28 - as there were some bugs in that alpha release of Sidecars), then you can use Sidecar containers. It's extremely simple, you:
- Move your container from
containers
toinitContainers
- Set the
restartPolicy: Always
Make sure that you set a startupProbe
too, as your main containers
won't start until all your initContainers
are up, the startupProbe
is what ensures the initContainer
is ready, for example on cloudsql-proxy
, for us, it looks like this:
startupProbe:
failureThreshold: 60
httpGet:
path: /startup
port: 9739
scheme: HTTP
periodSeconds: 1
successThreshold: 1
timeoutSeconds: 10
And that's it. The process will start before your app and shutdown after, and you can spend some time ripping out all the hacky stuff you've had to implement to work around it too!
If you're using Istio, enabling Sidecars is as simple as setting ENABLE_NATIVE_SIDECARS
on pilot:
spec:
components:
pilot:
k8s:
env:
- name: ENABLE_NATIVE_SIDECARS
value: 'true'
John Howard talks a little more about it here. Or of course depending on your requirements you could look at Ambient Mesh where you don't need a sidecar at all.
Bit of an update here; I wouldn't do it with Istio just yet! There's a reasonably nasty bug I've stumbled across which means outbound connections from your app will start seeing Connection: close
. This has pretty horrid interactions with some HTTP connection pools.
Gotchas
Other than the two updates above around Istio, and Multi-Sidecars, if like me, you use kube-state-metrics to capture kube_pod_container_resource_requests
to monitor your Sidecars (for us, we use container_cpu_usage_seconds_total
/ kube_pod_container_resource_requests
to calculate % cpu usage vs their request and alert when its > 100%
for 30 minutes), then you'll notice that stops working when they are moved to initContainers
.
It turns out for whatever reason init containers are in kube_pod_init_container_resource_requests
, they have exactly the same format though. I do kind of wish init
was just a label! Anyway, I had a lot of downstream recording rules etc that use this metric so the quick and easy win for me was to simply relabel it:
- source_labels: [__name__]
regex: 'kube_pod_init_container_resource_requests'
target_label: __name__
replacement: 'kube_pod_container_resource_requests'
Be careful with this approach however as it doesn't consider overlapping labels, for example if you had a container
and an initContainer
with the same name
, you'd lose data because the last-in would win. We don't, so it was fine.