Speeding up Jaeger on Elasticsearch

We use Elasticsearch as a backend for Jaeger. We also don't sample, so we record every request. Subsequently that means we store a lot of data. To put a lot into perspective, yesterday we stored 4,523,923,888 spans, around 2TB. Eek.

We found that at this level of volume, you can't ignore your Elasticsearch configuration. Query performance will tank as the daily index grows and by the end of the day the Jaeger GUI can become almost unusable for searching. There was some chat about it here https://github.com/jaegertracing/jaeger/pull/1475, but it ended up being closed due to inactivity. So I've decided to share a blog to help others tune their Jaeger and ES relationship.

It's worth noting that we use Jaeger via the Istio integration. Some of the tags I talk about below might be different in your setup.

Field Mappings

By default, Jaeger will create an elasticsearch index template for you. That template isn't going to cut it; so you need to set ES_CREATE_INDEX_TEMPLATES=false. You'll then need to grab yourself a copy of the existing index template, either from Elasticsearch or Github, so we can start making some modifications.

I'm going to assume a base level of Elasticsearch knowledge from this point forward, so I won't be covering how to apply mapping templates to your cluster. You can read the Elasticsearch documentation on that here.

eager_global_ordinals

By far one of the biggest bang for buck changes you can make is enabling eager_global_ordinals on fields you regularly aggregate on. Ordinals are basically a unique number representation of string values. When you perform aggregations on those fields, Elasticsearch needs to calculate them at query time. When you have a lot of new data being ingested, there's always a lot of work to be done at query time. Setting eager_global_ordinals: true moves that compute time to ingestion. As you write your documents, that value will be calculated up front, drastically improving the performance at query time.

We dig into our data in various different ways so we ended up setting this flag on the following fields:

  • traceID
  • spanID
  • operationName
  • tag.http@url
  • tag.upstream_cluster
  • process.serviceName

URL tokenisation

We want to be able to search across the tag tag.http@url in a variety of different ways. Lets take an example of https://www.google.com/some/path?query1=a&query2=b. By default we're analysing that as keyword, which is a pretty reasonable default. It allows us to do exact and partial matching. But lets say we want to find all spans to the host www.google.com, we're suddenly having to do tag.http@url:*www.google.com*. A wildcard at both the start and the end will perform terribly especially on the amount of data we have. This is where tokenisation analysis comes in.

The following analyser will (crudely) split that URL up into segments representing the scheme, host, path, and querystring parameters.

{
  "template": {
    "settings": {
      "analysis": {
        "char_filter": {
          "custom_regex_split": {
            "type": "pattern_replace",
            "pattern": "(https?)://([^/\\?]+)(/[^\\?]+)?\\??(.*)?",
            "replacement": "$1_SPLIT_$2_SPLIT_$3_SPLIT_$4"
          }
        },
        "tokenizer": {
          "url_tokenizer": {
            "type": "pattern",
            "pattern": "_SPLIT_|&"
          }
        },
        "filter": {
          "remove_split": {
            "type": "pattern_replace",
            "pattern": "_split_",
            "replacement": ""
          },
          "remove_empty_filter": {
            "type": "stop",
            "stopwords": [""]
          }
        },
        "analyzer": {
          "custom_url_analyzer": {
            "type": "custom",
            "tokenizer": "url_tokenizer",
            "char_filter": ["custom_regex_split"],
            "filter": ["lowercase", "remove_split", "remove_empty_filter"]
          }
        }
      }
    },
    "mappings": {
      "dynamic_templates": [
        {
          "http_url": {
            "mapping": {
              "type": "text",
              "analyzer": "custom_url_analyzer",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256,
                  "eager_global_ordinals": true
                }
              }
            },
            "path_match": "tag.http@url"
          }
        }
      }
    }
  }
}

You can test it in the Kibana Dev Tools console by hitting the index/_analyze endpoint:

Great, we'd now be able to filter by tag.http@url: www.google.com, or tag.http@url: /some/path in a really performant way. Note how I've used eager_global_ordinals here too.

Index Settings

There are some other index settings you should really consider, they are:

  • index.refresh_interval: 60s. We're ingesting large amounts of data, refreshing the searchable index is very expensive. Set this to a value you're comfortable with, balancing how quickly results appear in Jaeger vs cluster load.
  • index.codec: best_compression. To save on some storage costs at the expense of a small amount of CPU.
  • settings.number_of_shards: X. You'll need to find the right balance for your load. General best practice is to not go above 50GB per shard. Ours sit around 35GB with a number_of_shards: 30.

Query Caching

This might be controversial, but we set index.queries.cache.enabled: false on our prod-jaeger-span template. You can see here how we have over a 1GB on-heap query cache, with a measly 2% cache hit ratio. I'd rather that memory was free for the service to perform complex aggregations etc.

You should measure this yourself obviously, we use elasticsearch-exporter to get the metrics out. We found a few high-write indexes where this was the case.

Bonus Content!

Now that you've tokenised your URLs, and created your ordinals up front, you can do some interesting aggregations on your data really quickly. The query below will find what we call "Request Amplification". Examples of services that fan out to many other services. For example, service-a might receive 1 request, but make 200 downstream calls. In some cases, we've spotted 1000's of downstream calls. You can use a query like this to highlight bugs or areas for optimisation (for example where a batch endpoint might be suitable).

{
  "_source": false,
  "size": 0,
  "aggs": {
    "min_startTimeMillis": {
      "min": {
        "field": "startTimeMillis"
      }
    },
    "max_startTimeMillis": {
      "max": {
        "field": "startTimeMillis"
      }
    },
    "nested_references": {
      "nested": {
        "path": "references"
      },
      "aggs": {
        "top_spanIDs": {
          "terms": {
            "field": "references.spanID",
            "size": 5,
            "min_doc_count": "2000",
            "order": {
              "_count": "desc"
            }
          },
          "aggs": {
            "correlate_to_parent": {
              "reverse_nested": {},
              "aggs": {
                "service_names": {
                  "terms": {
                    "field": "process.serviceName",
                    "size": 1
                  }
                },
                "trace_id": {
                  "terms": {
                    "field": "traceID",
                    "size": 1
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Check it out, here we can see that in span 1e87ccaffd1d27cb sales-insight-admin.sales-insight-admin received one request but made 3,236 calls, ouch!

Summary

Implementing these changes made significant improvements in the search performance of Jaeger data for us. You can see here the before and after average response time of some scheduling scraping we do against the index:

I've put our whole index template on a gist here.