Show HN: ClickHouse-local – a small tool for serverless data analytics

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

ClickHouse

208 34,054 10.0 C++

ClickHouse® is a free analytics DBMS for big data

I did say about
> Query language, data types support, feature completeness, stability and testing
nothing about correctness.
In terms of stability, I see a couple of pretty old, and still unresolved issues about memory safety (data races, segmentation faults) in your repository, found by users.
In contrast, most of the memory safety issues in ClickHouse are found by continuous fuzzing before the release. And finding similar issues will give you a reward: https://github.com/ClickHouse/ClickHouse/issues/38986
Our testing system successfully finding issues in well known and widely used libraries - jemalloc, rocksdb, grpc, AWS, Arrow, Avro, ZooKeeper, Linux kernel... It is kind of surprising, and it makes an impression like we are the only product that does testing for real.
I also remember an example of using SQLancer from 1.5 years ago. When SQLancer appeared, we started to use it on ClickHouse, and it has found a few issues and one crash. At the same time, it has found a lot of crashes in DuckDB. But this example is very old, and DuckDB evolved a lot since then - it is a much younger technology after all.

duckdb

52 16,356 10.0 C++

DuckDB is an in-process SQL OLAP Database Management System

This summer I was preparing the ClickBench: https://benchmark.clickhouse.com/
When I tried to use DuckDB on the same dataset as ClickHouse, it simply did not work due to OOM: https://github.com/duckdb/duckdb/issues/3969
I also told them about our experience of using various memory allocators, and why you should never use the GLibC's malloc.
This issue was fixed.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
ClickBench

68 567 9.1 HTML

ClickBench: a Benchmark For Analytical Databases

This summer I was preparing the ClickBench: https://benchmark.clickhouse.com/
When I tried to use DuckDB on the same dataset as ClickHouse, it simply did not work due to OOM: https://github.com/duckdb/duckdb/issues/3969
I also told them about our experience of using various memory allocators, and why you should never use the GLibC's malloc.
This issue was fixed.

octosql

34 4,689 4.3 Go

OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.

Congrats on the Show HN!
It's great to see more tools in this area (querying data from various sources in-place) and the Lambda use case is a really cool idea!
I've recently done a bunch of benchmarking, including ClickHouse Local and the usage was straightforward, with everything working as it's supposed to.
Just to comment on the performance area though, one area I think ClickHouse could still possibly improve on - vs OctoSQL[0] at least - is that it seems like the JSON datasource is slower, especially if only a small part of the JSON objects is used. If only a single field of many is used, OctoSQL lazily parses only that field, and skips the others, which yields non-trivial performance gains on big JSON files with small queries.
Basically, for a query like `SELECT COUNT(*), AVG(overall) FROM books.json` with the Amazon Review Dataset, OctoSQL is twice as fast (3s vs 6s). That's a minor thing though (OctoSQL will slow down for more complicated queries, while for ClickHouse decoding the input is and remains the bottleneck).
[0]: https://github.com/cube2222/octosql

textql

15 9,028 3.7 Go

Execute SQL against structured text like CSV or TSV

As the author of textql ( https://github.com/dinedal/textql ) - thanks for the shoutout!
Looks great, I love more options in the space for CLI based data analysis tools! Fantastic work!

q

46 10,109 3.6 Python

q - Run SQL directly on delimited files and multi-file sqlite databases (by harelba)

I think they're talking about https://github.com/harelba/q, which is not very fast.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

🪄 DuckDB sql hack : get things SORTED w/ constraint CHECK
1 project | dev.to | 4 Apr 2024
We Built a 19 PiB Logging Platform with ClickHouse and Saved Millions
1 project | news.ycombinator.com | 2 Apr 2024
Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis
2 projects | dev.to | 27 Mar 2024
42.parquet – A Zip Bomb for the Big Data Age
1 project | news.ycombinator.com | 26 Mar 2024
DuckDB: Move to push-based execution model (2021)
1 project | news.ycombinator.com | 15 Mar 2024

Show HN: ClickHouse-local – a small tool for serverless data analytics

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
SQL Database Analytics Olap CLI
Post date: 5 Jan 2023

ClickHouse

duckdb

WorkOS

ClickBench

octosql

textql

q

Related posts

Show HN: ClickHouse-local – a small tool for serverless data analytics

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com SQL Database Analytics Olap CLI Post date: 5 Jan 2023

ClickHouse

duckdb

WorkOS

ClickBench

octosql

textql

q

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
SQL Database Analytics Olap CLI
Post date: 5 Jan 2023