Istio Upgrades: Prometheus SDS

How to handle the migration to Istio SDS in your prometheus instances.

If, like me, you run bespoke instances of Prometheus rather than the one which comes bundled with Istio, you've likely got some configuration that looks like this:

- job_name: 'kubernetes-pods-istio-secure'
  honor_labels: true
  scheme: https
  tls_config:
    ca_file: /etc/certs/root-cert.pem
    cert_file: /etc/certs/cert-chain.pem
    key_file: /etc/certs/key.pem
    insecure_skip_verify: true
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  ...

If you do, then I'm sorry to tell you but that's going to stop working when you upgrade Istio 1.6.  And it'll fail subtly.

History

Before SDS became the default way of distributing the mTLS certificates to your workloads, citadel was responsible for creating secrets in your workloads namespace named istio.default (where default was your service account name for your workload).  

The typical pattern then to enable Prometheus to scape mTLS protected endpoints was to volume mount those certificates in:

volumes:
- name: "istio-certs"
  secret:
    defaultMode: 420
    secretName: "istio.default"
volumeMounts:
- mountPath: /etc/certs
  name: istio-certs
  readOnly: true

However those secrets are now redundant and no longer get created by istiod.   They're not deleted as part of the Istio upgrade process however - so this will eventually manifest as mTLS failures due to the fact prometheus is using certificates that aren't being updated any more.  Lovely hey :)

So lets look at how we can get updated certs into Prometheus.

(A) Solution

There may be better ways to do this, and I'm more than happy for someone to comment with a better solution - but this is how I got it working.  It was a faff.

I decided to run an istio-proxy on the prometheus workload, and use a shared volume between istio-proxy and prometheus to share the certs that istio-proxy would get via SDS.  This was easier said than done.

Adding an istio-proxy to Prometheus

The first thing we need to do is ensure that the prometheus StatefulSet runs an istio proxy.  I did this by adding the following annotations:

sidecar.istio.io/inject: "true"
traffic.sidecar.istio.io/includeInboundPorts: ""
traffic.sidecar.istio.io/includeOutboundIPRanges: ""

Effectively they will cause a sidecar to be injected, but not configure any iptables interception.

Configuring istio-proxy to write Certificates to disk

We then need to configure istio-proxy to write the certificates to disk (by default it won't) but most importantly, write them to a volume which can be shared with the main prometheus application.  These are the annotations I used to do that:

proxy.istio.io/config: |
  proxyMetadata:
    OUTPUT_CERTS: /etc/istio-output-certs
  
sidecar.istio.io/userVolume: '[{"name": "istio-certs", "emptyDir": {"medium":"Memory"}}]'

sidecar.istio.io/userVolumeMount: '[{"name": "istio-certs", "mountPath": "/etc/istio-output-certs"}]'

The key part here is OUTPUT_CERTS, which tells istio-proxy to write the certificates received via SDS to that directory.

Note: It's very important you don't use /etc/certs for this path, see https://github.com/istio/istio/issues/28050

Reducing the istio-proxy footprint

I also added a Sidecar resource to reduce the memory footprint of the proxy, as this proxy wasn't going to be used for any communication, it doesn't need any cluster configuration from istiod other than a single mTLS host (see this issue), which in this case I'm using istiod itself.

apiVersion: networking.istio.io/v1beta1
kind: Sidecar
metadata:
  name: prometheus
  namespace: app-metrics
spec:
  egress:
  - hosts:
    - istio-system/istiod.istio-system.svc.cluster.local
  workloadSelector:
    labels:
      app: prometheus

Ensuring no mTLS is used when talking to Prometheus

I added a PeerAuthentication policy to ensure that any of my apps that talk directly to prometheus didn't attempt to do so over mTLS:

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: prometheus
  namespace: app-metrics
spec:
  mtls:
    mode: DISABLE
  selector:
    matchLabels:
      app: prometheus

Using the istio-proxy certificates in Prometheus

At this point, we have a sidecar which is writing the certificates to disk so I added a volumeMount on the prometheus StatefulSet for the istio certificates, which references the volume added by the sidecar injector:

volumeMounts:
- mountPath: /etc/prom-certs/
  name: istio-certs

And also updated the prometheus.yaml file to reference the new folder:

  tls_config:
    ca_file: /etc/prom-certs/root-cert.pem
    cert_file: /etc/prom-certs/cert-chain.pem
    key_file: /etc/prom-certs/key.pem
    insecure_skip_verify: true

Conclusion

That should get your mTLS scrapes working again.   I find it a bit annoying that I've had to make modifications to the injector config in order to get this working.