platform vs ClickHouse

platform

By Highload-fun

Suggest topics

Source Code

Suggest alternative

Edit details

ClickHouse

ClickHouse® is a real-time analytics DBMS (by ClickHouse)

Database Dbms Olap Analytics SQL distributed-database Big Data Mpp Clickhouse HacktoberFest

Source Code

clickhouse.com

Docs

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

platform		ClickHouse
	Project
3	Mentions	211
25	Stars	35,054
-	Growth	2.6%
10.0	Activity	10.0
about 3 years ago	Latest Commit	3 days ago
	Language	C++
-	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

platform

Posts with mentions or reviews of platform. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-21.

Ask HN: Which books/resources to understand modern Assembler?
6 projects | news.ycombinator.com | 21 Apr 2024

The highload.fun wiki[0] links some resources. The intel optimization manual[1] is also useful.
These resources are mostly aimed at solving problems for which compilers are not very useful, so there are probably other resources that are a better fit.
[0]: https://github.com/Highload-fun/platform/wiki
[1]: https://www.intel.com/content/www/us/en/content-details/6714...
SWAR find any byte from set
2 projects | news.ycombinator.com | 8 Mar 2023

The web site containing the article has a collection of other articles about this sort of thing. You can also read about this in some parts of Algorithms for Modern Hardware[0]. highload.fun includes some other links about this stuff[1]. In general there isn't a great guide and it's best to get your hands dirty. highload.fun happens to be a good place to do that :)
[0]: https://en.algorithmica.org/hpc/algorithms/prefix/
[1]: https://github.com/Highload-fun/platform/wiki
Cache invalidation is one of the hardest problems in computer science
2 projects | news.ycombinator.com | 26 Nov 2022

There are a lot of issues here, so I can share some stuff about some of them and hope that some helpful internet commenters come along and point out where I have neglected important things.
A single modern CPU core is superscalar and has a deep instruction pipeline. With your help, it will decode and reorder many instructions and execute many instructions concurrently. Each of those instructions can operate on a lot of data.
As famous online controversial opinion haver Casey Muratori tells us, most software just sucks, like really really bad (e.g. commonly people will post hash table benchmarks of high-performance hash tables that do bulk inserts in ~100ns/op, but you can do <10ns/op easily if you try), and using SIMD instructions is table stakes for making good use of the machinery inside of a single CPU core. SIMD instructions are not just for math! They are tools for general purpose programming, and when your program has unpredictable branches in it, it is often a lot cheaper to compute both branches and have a data dependency than to have a branch. Instructions like pshufb or blendv or just using a dang lookup table can replace branches. Wojciech Muła's web site[0] is the best collection of notes about using SIMD instructions for general-purpose programming, but I have found many of the articles to be incomplete or incorrect, and I have not yet done anything to fix the issue. "Using SIMD" ends up meaning that you choose the low-level layout of your data to be more suitable to processing using the instructions available, not just replacing the "glue code" and leaving the data the same as it was.
Inside your single CPU core there is hardware for handling virtual -> physical address translation. This is a special cache called the translation lookaside buffer (TLB). Normally, chips other than recent Apple chips have a couple hundred entries of 1 4KiB page each in the TLB, and recent Apple chips have a couple hundred entries of 1 16KiB page each. Normal programs deal with a bit more than 1 meg of RAM today, and as a result they spend a huge portion of their execution time on TLB misses. You can fix this by using explicit huge pages on Linux. This feature nominally exists on Windows but is basically unusable for most programs because it requires the application to run as administrator and because the OS will never compact memory once it is fragmented (so the huge pages must be obtained at startup and never released, or they will disappear until you reboot). I have not tried it on Mac. As an example of a normal non-crazy program that is helped by larger pages, one person noted[1] that Linux builds 16% faster on 16K vs on 4K pages.
Inside your single CPU core is a small hierarchy of set-associative caches. With your help, it will have the data it needs in cache almost all the time! An obvious aspect of this is that when you need to work on some data repeatedly, if you have a choice, you should do it before you have worked on a bunch of other data and caused that earlier data to be evicted (that is, you can rearrange your work to avoid "capacity misses"). A less obvious aspect of this is that if you operate on data that is too-aligned, you will greatly reduce the effective size of your cache, because all the data you are using will go into the same tiny subset of your cache! Famous online good opinion haver Dan Luu wrote about this here[2]. The links included in that post are also good resources for the topics you've asked about.
When coordinating between multiple CPU cores, as noted in TFA, it is helpful to avoid false sharing[3]. People in industry have mostly found that it is helpful to avoid sharing *at all*, which is why they have work explicitly divided among cores and communicate over queues rather than dumping things into a concurrent hash map and hoping things work out. In general this is not a popular practice, and if you go online and post stuff like "Well, just don't allocate any memory after startup and don't pass any data between threads other than by using queues" you will lose imaginary internet points.
There are some incantations you may want to apply if you would like Linux to prioritize running your program, which are documented in the Red Hat Low Latency Performance Tuning guide[4] and Erik Rigtorp's web site[5].
Some other various resources are highload.fun[6], a web site where you can practice this sort of thing, a list of links associated with highload.fun[7], Sergey Slotin's excellent online book[8], and Dendi Bakh's online course[9] and blog[10].
> Off topic: What level of sophistication about modern CPUs is _good_ to have?
Probably none? These skills are basically unemployable as far as I can tell.
[0]: http://0x80.pl/articles/index.html
[1]: https://twitter.com/AtTheHackOfDawn/status/13338951151741870...
[2]: https://danluu.com/3c-conflict/
[3]: https://rigtorp.se/ringbuffer/
[4]: https://access.redhat.com/sites/default/files/attachments/20...
[5]: https://rigtorp.se/low-latency-guide/
[6]: https://highload.fun/
[7]: https://github.com/Highload-fun/platform/wiki
[8]: https://en.algorithmica.org/hpc/
[9]: https://github.com/dendibakh/perf-ninja
[10]: https://easyperf.net/notes/

ClickHouse

Posts with mentions or reviews of ClickHouse. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-05-24.

Universal Data Migration: Using Slingdata to Transfer Data Between Databases
2 projects | dev.to | 24 May 2024

ClickHouse installed and running.
Simplified API Creation and Management: ClickHouse to APISIX Integration Without Code
3 projects | dev.to | 22 May 2024

In the world of data management and web services, creating and managing APIs can often be a complex and time-consuming task. However, with the right tools, this process can be significantly simplified. In this article, we will explore how to create APIs for fetching data from ClickHouse tables without writing any code and manage these APIs using APISIX. ClickHouse, a fast and open-source columnar database management system, provides an HTTP interface by default, enabling easy access to data. By integrating this with APISIX, an open-source API gateway, we can not only manage and log our APIs but also leverage a host of features provided by APISIX to enhance our API management capabilities.
The new APT 3.0 solver
8 projects | news.ycombinator.com | 14 May 2024

I've made a library named "glibc-compatibility": https://github.com/ClickHouse/ClickHouse/tree/master/base/gl...
When linking with this library, the resulting binary will not depend on the new symbol versions. It will run on glibc 2.4 and on systems as old as Ubuntu 8.04 and CentOS 5 even when built on the most modern system.
We Built a 19 PiB Logging Platform with ClickHouse and Saved Millions
1 project | news.ycombinator.com | 2 Apr 2024

Yes, we are working on it! :) Taking some of the learnings from current experimental JSON Object datatype, we are now working on what will become the production-ready implementation. Details here: https://github.com/ClickHouse/ClickHouse/issues/54864
Variant datatype is already available as experimental in 24.1, Dynamic datatype is WIP (PR almost ready), and JSON datatype is next up. Check out the latest comment on that issue with how the Dynamic datatype will work: https://github.com/ClickHouse/ClickHouse/issues/54864#issuec...
Build time is a collective responsibility
2 projects | news.ycombinator.com | 24 Mar 2024

In our repository, I've set up a few hard limits: each translation unit cannot spend more than a certain amount of memory for compilation and a certain amount of CPU time, and the compiled binary has to be not larger than a certain size.
When these limits are reached, the CI stops working, and we have to remove the bloat: https://github.com/ClickHouse/ClickHouse/issues/61121
Although these limits are too generous as of today: for example, the maximum CPU time to compile a translation unit is set to 1000 seconds, and the memory limit is 5 GB, which is ridiculously high.
Fair Benchmarking Considered Difficult (2018) [pdf]
2 projects | news.ycombinator.com | 10 Mar 2024

I have a project dedicated to this topic: https://github.com/ClickHouse/ClickBench
It is important to explain the limitations of a benchmark, provide a methodology, and make it reproducible. It also has to be simple enough, otherwise it will not be realistic to include a large number of participants.
I'm also collecting all database benchmarks I could find: https://github.com/ClickHouse/ClickHouse/issues/22398
How to choose the right type of database
15 projects | dev.to | 28 Feb 2024

ClickHouse: A fast open-source column-oriented database management system. ClickHouse is designed for real-time analytics on large datasets and excels in high-speed data insertion and querying, making it ideal for real-time monitoring and reporting.
Writing UDF for Clickhouse using Golang
2 projects | dev.to | 27 Feb 2024

Today we're going to create an UDF (User-defined Function) in Golang that can be run inside Clickhouse query, this function will parse uuid v1 and return timestamp of it since Clickhouse doesn't have this function for now. Inspired from the python version with TabSeparated delimiter (since it's easiest to parse), UDF in Clickhouse will read line by line (each row is each line, and each text separated with tab is each column/cell value):
The 2024 Web Hosting Report
37 projects | dev.to | 20 Feb 2024

For the third, examples here might be analytics plugins in specialized databases like Clickhouse, data-transformations in places like your ETL pipeline using Airflow or Fivetran, or special integrations in your authentication workflow with Auth0 hooks and rules.
Choosing Between a Streaming Database and a Stream Processing Framework in Python
10 projects | dev.to | 10 Feb 2024

Online analytical processing (OLAP) databases like Apache Druid, Apache Pinot, and ClickHouse shine in addressing user-initiated analytical queries. You might write a query to analyze historical data to find the most-clicked products over the past month efficiently using OLAP databases. When contrasting with streaming databases, they may not be optimized for incremental computation, leading to challenges in maintaining the freshness of results. The query in the streaming database focuses on recent data, making it suitable for continuous monitoring. Using streaming databases, you can run queries like finding the top 10 sold products where the “top 10 product list” might change in real-time.

What are some alternatives?

When comparing platform and ClickHouse you can also consider the following projects:

perf-ninja - This is an online course where you can learn and master the skill of low-level performance analysis and tuning.

loki - Like Prometheus, but for logs.

duckdb - DuckDB is an analytical in-process SQL database management system

Trino - Official repository of Trino, the distributed SQL query engine for big data, former

VictoriaMetrics - VictoriaMetrics: fast, cost-effective monitoring solution and time series database

TimescaleDB - An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.

datafusion - Apache DataFusion SQL Query Engine

RocksDB - A library that provides an embeddable, persistent key-value store for fast storage.

materialize - The data warehouse for operational workloads.

PostgreSQL - Mirror of the official PostgreSQL GIT repository. Note that this is just a *mirror* - we don't work with pull requests on github. To contribute, please see https://wiki.postgresql.org/wiki/Submitting_a_Patch

Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

TileDB - The Universal Storage Engine

platform vs perf-ninja ClickHouse vs loki ClickHouse vs duckdb ClickHouse vs Trino ClickHouse vs VictoriaMetrics ClickHouse vs TimescaleDB ClickHouse vs datafusion ClickHouse vs RocksDB ClickHouse vs materialize ClickHouse vs PostgreSQL ClickHouse vs Apache Arrow ClickHouse vs TileDB

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

Compare platform vs ClickHouse and see what are their differences.

platform

ClickHouse

platform

ClickHouse

What are some alternatives?