semi_index
ClickHouse
semi_index | ClickHouse | |
---|---|---|
1 | 226 | |
57 | 37,064 | |
- | 1.7% | |
10.0 | 10.0 | |
almost 12 years ago | 1 day ago | |
C++ | C++ | |
GNU General Public License v3.0 or later | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
semi_index
ClickHouse
-
Kotlin DataFrame ❤️ Arrow
ClickHouse is a high-performance, column-oriented SQL database management system (DBMS) designed for online analytical processing (OLAP). ClickHouse allows using Arrow Stream as an output format.
- Vecint: Average Color
-
Clickhouse for Embedded Analytics: First Impressions and Unexpected Challenges
We started to look for alternatives and quickly landed at Clickhouse.
-
Lessons Learned #2: Your new feature could introduce a security vulnerability to your old feature (Clickhouse CVE-2024-22412)
In today’s story, we will discuss CVE-2024-22412 which affected ClickHouse a popular open-source column-oriented database management system typically used for online analytical processing (OLAP) in real-time. You can find the full write-up of the vulnerability here.
-
Show HN: Insights.hn – Real-time Hacker News posts and comments analytics
This is really great!
I can suggest more ideas that will be easy to add:
- a spark line or heat map of upvotes for every thread: https://github.com/ClickHouse/ClickHouse/issues/59020
- a built-in SQL editor for custom queries;
If you need help in supporting or hosting it, write to milovidov at clickhouse.com
-
How we Built 300μs Typo Correction for 1.3M Words in Rust
We chose ClickHouse to store the dictionary as we ran into deadlock and performance issues with Postgres writes as we scaled the number of workers. ClickHouse's async inserts are fantastic for this task and allowed us to ingest the entire 38M+ document dataset in < 1hr.
- ClickHouse: New JSON data type and semistructured columns
-
Garage: Open-Source Distributed Object Storage
Minio is fairly easy to setup locally or in CI.
We use it for CI in ClickHouse, for example: https://github.com/ClickHouse/ClickHouse/blob/master/docker/...
-
If you're using Polyfill.io code on your site – like 100k are – remove it
PS. If you want to know about this dataset, check https://github.com/ClickHouse/ClickHouse/issues/18842
-
A brief introduction to interval arithmetic
This is a good article!
In ClickHouse, interval arithmetic is applied to index analysis. A sparse index consists of granules, and each granule is an interval of tuples in lexicographic order. This interval is decomposed into a union of hyperrectangles. Conditions such as comparisons, logic operators, and many other functions are evaluated on these hyperrectangles, yielding boolean intervals. Boolean intervals represent ternary logic (always true, always false, can be true or false). Interesting tricks include: applying functions that are monotonic on ranges (for example, the function "day of month" is monotonic as long as the month does not change), calculating function preimages on intervals, and even calculating preimages of n-ary functions, which is useful for space-filling curves, such as Morton or Hilbert curves.
Check for more details: https://github.com/ClickHouse/ClickHouse/blob/master/src/Sto...
Or see examples, such as https://adsb.exposed/
What are some alternatives?
jq-zsh-plugin - jq zsh plugin
loki - Like Prometheus, but for logs.
json-toolkit - "the best opensource converter I've found across the Internet" -- dene14
DuckDB - DuckDB is an analytical in-process SQL database management system
json-buffet
Trino - Official repository of Trino, the distributed SQL query engine for big data, former
reddit_mining
VictoriaMetrics - VictoriaMetrics: fast, cost-effective monitoring solution and time series database
json-streamer - A fast streaming JSON parser for Python that generates SAX-like events using yajl
RocksDB - A library that provides an embeddable, persistent key-value store for fast storage.
xsv - A fast CSV command line toolkit written in Rust.
TimescaleDB - An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.