Our great sponsors
-
clickhouse-operator
Altinity Kubernetes Operator for ClickHouse creates, configures and manages ClickHouse clusters running on Kubernetes
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
hosts
🔒 Consolidating and extending hosts files from several well-curated sources. Optionally pick extensions for porn, social media, and other categories.
-
matano
Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
https://benchmark.clickhouse.com/
Good to see transparent comparisons available now for Cloud performance vs. self-hosted or bare metal results as well as results from our peers. The ClickHouse team will continue to optimize further - as scale and performance is a relentless pursuit here at ClickHouse, and something we expect to be performed transparently and in a reproducible manner. Public benchmarking benefits all of us in the tech industry as we learn from each other in sharing the best techniques for attaining high performance within a cloud architecture
Full disclosure: I do work for ClickHouse, although have also been a past member of SPEC in developing and advocating for public, standardized benchmarks
but this pricing looks excessive.
A single node instance with a fast disk is more than sufficient for most needs: https://hub.docker.com/r/clickhouse/clickhouse-server
If you need a cluster, https://github.com/Altinity/clickhouse-operator makes things easy
And we request removal as and where we can.
https://github.com/StevenBlack/hosts/issues/1781
Unfortunately mvps.org no longer seems to reply to emails.
Check out the Altinity Sink Connector for ClickHouse [0]. This is advancing quite quickly and already has prod deployments. Please feel free to try it out.
[0] https://github.com/Altinity/clickhouse-sink-connector
Clickhouse is great but the ops and scaling make it notoriously difficult to self host.
If you have a lot of log data and want something open source and serverless you can self host, check out Matano (https://github.com/matanolabs/matano).
The particular way in which the data is loaded into DuckDB and the particular machine configuration on which it is run triggers a problem in DuckDB related to memory management. Essentially the standard Linux memory allocator does not like our allocation pattern when doing this load, which causes the system to run out-of-memory despite freeing more memory than we allocate. More info is provided here [1].
As it is right now the benchmark is not particularly representative of DuckDB's performance. Check back in a few months :)
[1] https://github.com/duckdb/duckdb/issues/3969#issuecomment-11...
https://github.com/dcmoura/spyql/blob/master/notebooks/json_...
And ClickHouse looks like a normal relational database - there is no need for multiple components for different tiers (like in Druid), no need for manual partitioning into "daily", "hourly" tables (like you do in Spark and Bigquery), no need for lambda architecture... It's refreshing how something can be both simple and fast.
Related posts
- Real Time Data Infra Stack
- 🪄 DuckDB sql hack : get things SORTED w/ constraint CHECK
- We Built a 19 PiB Logging Platform with ClickHouse and Saved Millions
- Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis
- 42.parquet – A Zip Bomb for the Big Data Age