How DataStax Tracked Down a Linux Kernel Bug with Fallout

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Apache Pulsar

10 13,706 9.8 Java

Apache Pulsar - distributed pub-sub messaging system

Fallout is an open-source distributed systems testing service that we use heavily at DataStax to run functional and performance tests for Apache Cassandra, Apache Pulsar, and other projects. Fallout automatically provisions and configures distributed systems and clients, runs a variety of workloads and benchmarks, and gathers test results for later analysis.
prometheus

59 52,556 9.9 Go

The Prometheus monitoring system and time series database.

Cassandra has a number of tools to understand what’s happening internally as it serves data to clients. A lot of this is tracked as metrics in monitoring tools such as Prometheus or Grafana which integrate with Fallout. Checking those metrics showed that request throughput (requests per second) dropped off to near zero whenever we triggered the bug. To understand what was happening on the server side, I waited for the bug to occur and then took a look at the output of nodetool tpstats.
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Grafana

56 60,196 10.0 TypeScript

The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.

Cassandra has a number of tools to understand what’s happening internally as it serves data to clients. A lot of this is tracked as metrics in monitoring tools such as Prometheus or Grafana which integrate with Fallout. Checking those metrics showed that request throughput (requests per second) dropped off to near zero whenever we triggered the bug. To understand what was happening on the server side, I waited for the bug to occur and then took a look at the output of nodetool tpstats.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project