Upgrading to Istio 1.8 & 1.9
Some good news! 1.8 and 1.9 were a lot less painful than previous releases, so I bundled them into a single blog post. Remember that you shouldn't skip-version upgrade so if you're still on 1.7, go through 1.8 to 1.9. This blog will cover the gotchas I encountered during that process.
[1.8] Upgrades Broken if you use Sidecar.ingress
If like me in your Sidecar
resource you make use of ingress
, like this:
spec:
egress:
...
ingress:
- defaultEndpoint: 127.0.0.1:8080
port:
name: http-app
number: 8080
protocol: HTTP
Fortunately this is fixed in 1.8.3, so make sure you're going to 1.8.3+!
[1.8] Ingress gateway is always deployed
In 1.7 and earlier we disabled ingress-gateway by setting it to an empty array:
spec:
components:
ingressGateways:
However on 1.8, this leads to you getting ingress-gateways. You need to disable them explicity:
spec:
components:
ingressGateways:
- name: istio-ingressgateway
enabled: false
The issue is still open, so keep an eye out for this one.
We catch these issues by using src/#{version}/bin/istioctl manifest generate -f istiooperator.yaml
to generate manifest files that get checked into Git. We then look at the Git diff. 50%+ of the issues we've found in istio have been templating related, and we've caught them in this way.
Personally it is why I don't trust the operator model, as these changes would have been applied to my cluster, which is far too late in the process to detect such changes.
[1.8 & 1.9] Big increase in status codes with a DC response flag
I'm not going to into too much detail here, as a lot of the detail is on the issue. At a high level however we're now seeing a fair % of requests being marked as DC (downstream connection terminated). Historically these were only recorded when the downstream terminated the connection, for example during a timeout, which would result in a 0DC.
However on 1.8+ we're now seeing them across all status codes, and it's particularly exacerbated on applications which do not use connection pooling (so creating many more connections).
I find this a little frustrating because:
- It increases cardinality of your prometheus metrics as you now have a DC dimension on all response codes that wasn't there before
- It confuses users, as their metrics are now showing some requests as
200
and others as200DC
There's some talk of it being related to TCP Half Close, but as of yet I haven't been able to confirm this, and I'm not sure why it would have changed in 1.8+.
There are a couple of mitigations for me:
- You can enable
http2Upgrade
in yourDestinationRule
, this will upgrade the connection between source and destination envoy tohttp2
and make use ofhttp2 pipelining
. However I still have my concerns about this feature, as explained in this issue I opened 9 months ago - Enable connection pooling in your http clients in your apps
[1.9] ISTIO-SECURITY-2021-001
Hopefully you've seen the security bulletin - but if not the high level is if you're using RequestAuthentication
alone for JWT validation, you're vulnerable. This is fixed in 1.9.1, so make sure you skip 1.9.0
, or if you're already there - upgrade pronto.
[1.9] Significant reduction in memory
I don't want my blog posts to be all gloom and doom, so here's one massive win for 1.9.1 - a 20%+ reduction in footprint for istio-proxy! Finally it's going in the right direction.
Summary
All in all these upgrades were significantly better than previous releases. Clearly the feedback is being heard and the developers working on the project are taking upgrade pain seriously. Kudos.