Upgrading to Istio 1.8 & 1.9

Some good news! 1.8 and 1.9 were a lot less painful than previous releases, so I bundled them into a single blog post. Remember that you shouldn't skip-version upgrade so if you're still on 1.7, go through 1.8 to 1.9. This blog will cover the gotchas I encountered during that process.

[1.8] Upgrades Broken if you use Sidecar.ingress

If like me in your Sidecar resource you make use of ingress, like this:

spec:
  egress:
  ...
  ingress:
  - defaultEndpoint: 127.0.0.1:8080
    port:
      name: http-app
      number: 8080
      protocol: HTTP

Fortunately this is fixed in 1.8.3, so make sure you're going to 1.8.3+!

[1.8] Ingress gateway is always deployed

In 1.7 and earlier we disabled ingress-gateway by setting it to an empty array:

spec:
  components:
    ingressGateways:

However on 1.8, this leads to you getting ingress-gateways. You need to disable them explicity:

spec:
  components:
     ingressGateways:
     - name: istio-ingressgateway
       enabled: false

The issue is still open, so keep an eye out for this one.

We catch these issues by using src/#{version}/bin/istioctl manifest generate -f istiooperator.yaml to generate manifest files that get checked into Git. We then look at the Git diff. 50%+ of the issues we've found in istio have been templating related, and we've caught them in this way.

Personally it is why I don't trust the operator model, as these changes would have been applied to my cluster, which is far too late in the process to detect such changes.

[1.8 & 1.9] Big increase in status codes with a DC response flag

I'm not going to into too much detail here, as a lot of the detail is on the issue. At a high level however we're now seeing a fair % of requests being marked as DC (downstream connection terminated). Historically these were only recorded when the downstream terminated the connection, for example during a timeout, which would result in a 0DC.

However on 1.8+ we're now seeing them across all status codes, and it's particularly exacerbated on applications which do not use connection pooling (so creating many more connections).

I find this a little frustrating because:

  • It increases cardinality of your prometheus metrics as you now have a DC dimension on all response codes that wasn't there before
  • It confuses users, as their metrics are now showing some requests as 200 and others as 200DC

There's some talk of it being related to TCP Half Close, but as of yet I haven't been able to confirm this, and I'm not sure why it would have changed in 1.8+.

There are a couple of mitigations for me:

  • You can enable http2Upgrade in your DestinationRule, this will upgrade the connection between source and destination envoy to http2 and make use of http2 pipelining. However I still have my concerns about this feature, as explained in this issue I opened 9 months ago
  • Enable connection pooling in your http clients in your apps

[1.9] ISTIO-SECURITY-2021-001

Hopefully you've seen the security bulletin - but if not the high level is if you're using RequestAuthentication alone for JWT validation, you're vulnerable. This is fixed in 1.9.1, so make sure you skip 1.9.0, or if you're already there - upgrade pronto.

[1.9] Significant reduction in memory

I don't want my blog posts to be all gloom and doom, so here's one massive win for 1.9.1 - a 20%+ reduction in footprint for istio-proxy! Finally it's going in the right direction.

Summary

All in all these upgrades were significantly better than previous releases. Clearly the feedback is being heard and the developers working on the project are taking upgrade pain seriously. Kudos.