Istio now has a first-class API for configuring Telemetry Previously; you'd be configuring telemetry in the
meshConfig section of Istio configuration. Switching across wasn't seamless, so figured I'd share the gotchas I ran into here.
If like me, you were removing some dimensions from Istio metrics to reduce cardinality, then you'll have had a section in your config that looks like this:
When you're writing your new Telemetry CR; the above config would look like this:
WARNING! ALL_METRICS doesn't seem to work!
Yeah... for some reason - I had to explicitly disable each metric on
SERVER. It feels like a bug, and I've raised it here.
I won't go into the rest of the Custom Resource spec in too much detail, as it's quite well documented on the Telemetry Metrics Overrides documentation. The slightly confusing part here though is that in your
istiooperator.yaml, you need to disable telemetry:
It's a bit counter intuitive if you ask me, but without this I found that the
EnvoyFilters for telemetry V2 would get created, and I ended up with two sets of metrics with different tag configuration being output on my Sidecars metric endpoint. That increased the scrape size of my cluster pretty significantly until I'd rolling restarted all of the Istio workloads.
Another thing I did previous was to remove the
Destination metrics, and only record metrics at the source. I did this by editing the
EnvoyFilter to remove the
SIDECAR_INBOUND section. In the new Telemetry API the terminology is
SERVER, so in our example, we want to remove the
That's about it for prometheus!
My original driver for moving to Telemetry V2 was to enable me to explore sending tracers to another tracer than Jaeger. The documentation gives the following example:
And here's the gotcha; notice that providers is an array, however - it can't take an array. You can only send to a single tracer at any time and if you put multiple items in here, only the first will be used. I would assume (and hope) that the plan is to add support for multiple in the future.
Ignoring the gotcha above, these are the changes I needed to make in my
This replaced this section:
Things to note here, similar to the above example where i disabled telemetry I also set
enableTracing: false. Extension providers are well documented, think of these as configuration options for potential destinations - which we'll reference in the Telemetry CRD. In the example above you can see that I've confugred two options. Note how i'm using
B3 as the context for
opencensus, that enables me to switch between the two and use the same header propogation method as Zipkin.
Another gotcha here is that the new Telemetry system uses EDS to discover endpoints for the extensionProvider. That means if you use a
Sidecar resource, you must have the host defined. Here's an example mesh global Sidecar:
Without this, your
Sidecar will fail to load the new config as it'll complain about an unknown Cluster. I don't think the UX there is great, and raised it on GitHub.
In order to active your new configuration; you add the following section to your telemetry config:
Here you can see i've referenced the provider I created above, with a sample rate of 10%.
That's broadly it. Having a first class API for Telemetry is great. Being able to create application-specific overrides with targetted CR's is lovely.
Migrating wasn't too bad, hopefully this doc will make it a little simpler for the next person.