Scaling Temporal: The Basics

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • kube-prometheus

    Use Prometheus to monitor Kubernetes and applications running on Kubernetes

  • For our load testing we’ve deployed Temporal on Kubernetes, and we’re using MySQL for the persistence backend. The MySQL instance has 4 CPU cores and 32GB RAM, and each Temporal service (Frontend, History, Matching, and Worker) has 2 pods, with requests for 1 CPU core and 1GB RAM as a starting point. We’re not setting CPU limits for our pods—see our upcoming Temporal on Kubernetes post for more details on why. For monitoring we’ll use Prometheus and Grafana, installed via the kube-prometheus stack, giving us some useful Kubernetes metrics.

  • benchmark-workers

    Pre-written workflows and activities useful for benchmarking Temporal

  • To create load on a Temporal Cluster, we need to start Workflows and have Workers to run them. To make it easy to set up load tests, we have packaged a simple Workflow and some Activities in the benchmark-workers package. Running a benchmark-worker container will bring up a load test Worker with default Temporal Go SDK settings. The only configuration it needs out of the box is the host and port for the Temporal Frontend service.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • temporal

    Temporal service

  • However, as we mentioned, each shard needs management. Part of the management includes a cache of Workflow histories for that shard. We can see the History pods’ memory usage is rising quickly. If the pods run out of memory, Kubernetes will terminate and restart them (OOMKilled). This causes Temporal to rebalance the shards onto the remaining History pod(s), only to then rebalance again once the new History pod comes up. Each time you make a scaling change, be sure to check that all Temporal pods are still within their CPU and memory requests—pods frequently being restarted is very bad for performance! To fix this, we can bump the memory limits for the History containers. Currently, it is hard to estimate the amount of memory a History pod is going to use because the limits are not set per host, or even in MB, but rather as a number of cache entries to store. There is work to improve this: github.com/temporalio/temporal/issues/2941. For now, we’ll set the History memory limit to 8GB and keep an eye on them—we can always raise it later if we find the pod needs more.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts