Show HN: ClickHouse-local – a small tool for serverless data analytics

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • ClickHouse

    ClickHouse® is a free analytics DBMS for big data

  • I did say about

    > Query language, data types support, feature completeness, stability and testing

    nothing about correctness.

    In terms of stability, I see a couple of pretty old, and still unresolved issues about memory safety (data races, segmentation faults) in your repository, found by users.

    In contrast, most of the memory safety issues in ClickHouse are found by continuous fuzzing before the release. And finding similar issues will give you a reward: https://github.com/ClickHouse/ClickHouse/issues/38986

    Our testing system successfully finding issues in well known and widely used libraries - jemalloc, rocksdb, grpc, AWS, Arrow, Avro, ZooKeeper, Linux kernel... It is kind of surprising, and it makes an impression like we are the only product that does testing for real.

    I also remember an example of using SQLancer from 1.5 years ago. When SQLancer appeared, we started to use it on ClickHouse, and it has found a few issues and one crash. At the same time, it has found a lot of crashes in DuckDB. But this example is very old, and DuckDB evolved a lot since then - it is a much younger technology after all.

  • duckdb

    DuckDB is an in-process SQL OLAP Database Management System

  • This summer I was preparing the ClickBench: https://benchmark.clickhouse.com/

    When I tried to use DuckDB on the same dataset as ClickHouse, it simply did not work due to OOM: https://github.com/duckdb/duckdb/issues/3969

    I also told them about our experience of using various memory allocators, and why you should never use the GLibC's malloc.

    This issue was fixed.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • ClickBench

    ClickBench: a Benchmark For Analytical Databases

  • This summer I was preparing the ClickBench: https://benchmark.clickhouse.com/

    When I tried to use DuckDB on the same dataset as ClickHouse, it simply did not work due to OOM: https://github.com/duckdb/duckdb/issues/3969

    I also told them about our experience of using various memory allocators, and why you should never use the GLibC's malloc.

    This issue was fixed.

  • octosql

    OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.

  • Congrats on the Show HN!

    It's great to see more tools in this area (querying data from various sources in-place) and the Lambda use case is a really cool idea!

    I've recently done a bunch of benchmarking, including ClickHouse Local and the usage was straightforward, with everything working as it's supposed to.

    Just to comment on the performance area though, one area I think ClickHouse could still possibly improve on - vs OctoSQL[0] at least - is that it seems like the JSON datasource is slower, especially if only a small part of the JSON objects is used. If only a single field of many is used, OctoSQL lazily parses only that field, and skips the others, which yields non-trivial performance gains on big JSON files with small queries.

    Basically, for a query like `SELECT COUNT(*), AVG(overall) FROM books.json` with the Amazon Review Dataset, OctoSQL is twice as fast (3s vs 6s). That's a minor thing though (OctoSQL will slow down for more complicated queries, while for ClickHouse decoding the input is and remains the bottleneck).

    [0]: https://github.com/cube2222/octosql

  • textql

    Execute SQL against structured text like CSV or TSV

  • As the author of textql ( https://github.com/dinedal/textql ) - thanks for the shoutout!

    Looks great, I love more options in the space for CLI based data analysis tools! Fantastic work!

  • q

    q - Run SQL directly on delimited files and multi-file sqlite databases (by harelba)

  • I think they're talking about https://github.com/harelba/q, which is not very fast.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts