Benchmarking: TimescaleDB vs. ClickHouse

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • tsbs

    Time Series Benchmark Suite, a tool for comparing and evaluating databases for time series data

    (Timescale co-founder)

    I'll answer this here with a similar response that I gave Pradeep (the author) via Twitter.

    I think ClickHouse is a great technology. It totally beats TimescaleDB for OLAP queries. I'll be the first to admit that.

    What our (100+ hour, 3 month analysis) benchmark showed is that for _time-series workloads_, TimescaleDB fared better. [0]

    Pradeep's analysis - while earnest - is essentially comparing OLAP style queries using a dataset that is not very representative of time-series workloads. Which is why the time-series benchmark suite (TSBS) [1] exists (which we did not create, although we now maintain it). I've asked Pradeep to compare using the TSBS - and he said he'd look into it. [2]

    As a developer, I'm very wary of technologies that claim to be better at everything - especially those who hide their weaknesses. We don't do that at TimescaleDB. For those who read our benchmark closely, we clearly show where ClickHouse beats TimescaleDB, and where TimescaleDB does better. And - despite what many commenters on here may want you to think - we heap loads of praise on ClickHouse.

    As a reader of HackerNews, I'm also tired of all the negativity that's developing on this site. People who bully. People who default to accusing others of dishonesty instead of trying to have a meaningful dialogue and reach mutual understanding. People who enter debates wanting to be right, versus wanting to identify the right answer. Disappointingly, this includes some visible influencers whom I personally know. We should all strive to do better, to assume positive intent, and have productive dialogues.

    (This is why one of our values at TimescaleDB is "Assume Positive Intent." [3] I think Hacker News - and the world in general - would be a much better, happier, healthier place if we all just did that.)

    [0] https://blog.timescale.com/blog/what-is-clickhouse-how-does-...

    [1] https://github.com/timescale/tsbs

    [2] https://twitter.com/p_chhetri/status/1455216425807745025

    [3] https://www.timescale.com/careers

  • VictoriaMetrics

    VictoriaMetrics: fast, cost-effective monitoring solution and time series database

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • clickhouse_fdw

    ClickHouse FDW for PostgreSQL

    In this case PostgreSQL may be able to come to the rescue :)

    There is Clickhouse FDW for PostgreSQL which in some cases can provide great speed with full join support

    https://github.com/adjust/clickhouse_fdw

  • TimescaleDB

    An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.

  • pmacct

    pmacct is a small set of multi-purpose passive network monitoring tools [NetFlow IPFIX sFlow libpcap BGP BMP RPKI IGP Streaming Telemetry].

    While I'm not a current customer of Timescale, I do use the open source version of Timescale extensively, so I feel like I can summarize some of the benefits of Timescale over other TSDB's. The company is a mid size, with awkward data 4+PB unstructured data, with our Postgres cluster hosting about 20 TB of data.

    The main advantage from my perspective, is that you can query across data business data and time series data with all the advantages that Postgres has. Time series data while useful on its own, becomes incredibly powerful when it can be combined with your business and production data.

    A great example is our outbound network data monitoring. We use pmacct http://www.pmacct.net/ to send network flows to Postgres from our firewall, host inventory data in Postgres, and a foreign data wrapper around our LDAP data to determine user / host assignment, and from that we can correlate every data flow to the user who is assigned to the host that generated that particular flow. This makes for some pretty powerful security reporting. Outside of that, we use Timescale's hypertables in a number of places that aren't explicitly timeseries data, like syslog data, web server logs, etc. This allows for some pretty amazing reporting on log data that is timeboxed, like "give me all the 500 errors from our HTTP log that have an ip address in Finland (did I mention that we load GeoIP data into Postgres every night) in the last 3.5 hours.

    Timescale is excellent on its own, and honestly competitive with other TSDB's on its own. Having access to the full Postgres ecosystem with your timeseries data makes Timescale way ahead of everyone else. My story might change when I hit the limits of what a single Postgres host can ingest, but I'm not even close to that scale yet.

    Other advantages of Timescale, is having access to real SQL, you don't have to learn a new domain specific query language, you can just use SQL. This admittedly can be a double edge sword. SQL is more complicated than PromQL / InfluxQL, however that comes with quite a lot of extra capability, and the ability to transfer that knowledge into other domains.

    I personally really like Timescale, and feel that regardless of anyones benchmarks, no matter how well thought out or not, the advantages outweigh the disadvantages by a pretty large margin.

  • promscale

    Discontinued [DEPRECATED] Promscale is a unified metric and trace observability backend for Prometheus, Jaeger and OpenTelemetry built on PostgreSQL and TimescaleDB.

    At first, let's give the definition of `time series`. This is a series of (timestamp, value) pairs ordered by timestamp. The `value` may contain arbitrary data - a floating-point value, a text, a json, a data structure with many columns, etc. Each time series is uniquely identified by its name plus an optional set of {label="value"} labels. For example, temperature{city="London",country="UK"} or log_stream{host="foobar",datacenter="abc",app="nginx"}.

    ClickHouse is perfectly optimized for storing and querying of such time series, including metrics. That's true that ClickHouse isn't optimized for handling millions of tiny inserts per second. It prefers infrequent batches with big number of rows per each batch. But this isn't the real problem in practice, because:

    1) ClickHouse provides Buffer table engine for frequent inserts.

    2) It is easy to create a special proxy app or library for data buffering before sending it to ClickHouse.

    TimescaleDB provides Promscale [1] - a service, which allows using TimescaleDB as a storage backend for Prometheus. Unfortunately, it doesn't show outstanding performance comparing to Prometheus itself and to other remote storage solutions for Prometheus. Promscale requires more disk space, disk IO, CPU and RAM according to production tests [2], [3].

    [1] https://github.com/timescale/promscale

    [2] https://abiosgaming.com/press/high-cardinality-aggregations/

    [3] https://valyala.medium.com/promscale-vs-victoriametrics-reso...

    Full disclosure: I'm CTO at VictoriaMetrics - competing solution for TimescaleDB. VictoriaMetrics is built on top of architecture ideas from ClickHouse.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts