ClickHouse Cloud is now in Public Beta

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • ClickBench

    ClickBench: a Benchmark For Analytical Databases

  • https://benchmark.clickhouse.com/

    Good to see transparent comparisons available now for Cloud performance vs. self-hosted or bare metal results as well as results from our peers. The ClickHouse team will continue to optimize further - as scale and performance is a relentless pursuit here at ClickHouse, and something we expect to be performed transparently and in a reproducible manner. Public benchmarking benefits all of us in the tech industry as we learn from each other in sharing the best techniques for attaining high performance within a cloud architecture

    Full disclosure: I do work for ClickHouse, although have also been a past member of SPEC in developing and advocating for public, standardized benchmarks

  • clickhouse-operator

    Altinity Kubernetes Operator for ClickHouse creates, configures and manages ClickHouse clusters running on Kubernetes

  • but this pricing looks excessive.

    A single node instance with a fast disk is more than sufficient for most needs: https://hub.docker.com/r/clickhouse/clickhouse-server

    If you need a cluster, https://github.com/Altinity/clickhouse-operator makes things easy

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • hosts

    🔒 Consolidating and extending hosts files from several well-curated sources. Optionally pick extensions for porn, social media, and other categories.

  • And we request removal as and where we can.

    https://github.com/StevenBlack/hosts/issues/1781

    Unfortunately mvps.org no longer seems to reply to emails.

  • clickhouse-sink-connector

    Replicate data from MySQL, Postgres and MongoDB to ClickHouse

  • Check out the Altinity Sink Connector for ClickHouse [0]. This is advancing quite quickly and already has prod deployments. Please feel free to try it out.

    [0] https://github.com/Altinity/clickhouse-sink-connector

  • matano

    Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS

  • Clickhouse is great but the ops and scaling make it notoriously difficult to self host.

    If you have a lot of log data and want something open source and serverless you can self host, check out Matano (https://github.com/matanolabs/matano).

  • duckdb

    DuckDB is an in-process SQL OLAP Database Management System

  • The particular way in which the data is loaded into DuckDB and the particular machine configuration on which it is run triggers a problem in DuckDB related to memory management. Essentially the standard Linux memory allocator does not like our allocation pattern when doing this load, which causes the system to run out-of-memory despite freeing more memory than we allocate. More info is provided here [1].

    As it is right now the benchmark is not particularly representative of DuckDB's performance. Check back in a few months :)

    [1] https://github.com/duckdb/duckdb/issues/3969#issuecomment-11...

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • mgbench

  • spyql

    Query data on the command line with SQL-like SELECTs powered by Python expressions

  • https://github.com/dcmoura/spyql/blob/master/notebooks/json_...

    And ClickHouse looks like a normal relational database - there is no need for multiple components for different tiers (like in Druid), no need for manual partitioning into "daily", "hourly" tables (like you do in Spark and Bigquery), no need for lambda architecture... It's refreshing how something can be both simple and fast.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts