arroyo
tensorbase
arroyo | tensorbase | |
---|---|---|
13 | 1 | |
3,326 | 1,429 | |
3.2% | 0.4% | |
9.6 | 0.0 | |
6 days ago | about 2 years ago | |
Rust | Rust | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
arroyo
- FLaNK AI Weekly 18 March 2024
- Arryo 0.8 released — streaming SQL engine
-
Query Engines: Push vs. Pull
Interesting - I looked into your code a bit. I found your window aggregation library [1]. You may be interested in looking into the Rust implementation of some of the research work I've been a part of [2].
In Flink, I believe the reason they need to implement their own backpressure system is that they multiplex TCP connections. That is, they have multiple logical streams flowing through a single TCP connection. If that's the case, you need to do some work to 1) detect which logical stream is the one that's blocking, and 2) don't block because other logical streams may be able to use the active TCP connection.
Thinking it through, I think what Flink's approach buys is not necessarily better performance, but better just a manageable number of connections. That is, imagine you have a process P1 with operators A, B and C. And then P2 has D, E, F. Now imagine that this is a shuffle, where A, B and C are fully connected to D, E and F. In my old system, you would have 9 TCP connections. In Flink, you will have 1.
[1] https://github.com/ArroyoSystems/arroyo/blob/master/arroyo-w...
- Arroyo
- Show HN: Arroyo – Write SQL on streaming data
- Release v0.3.0 · ArroyoSystems/arroyo - Stream Processing Engine
- Arroyo 0.2 released - Rust stream processing engine, now on Kubernetes
- Distributed stream processing engine written in Rust
- ArroyoSystems/arroyo: Arroyo is a distributed stream processing engine written in Rust
- Arroyo, a new open-source SQL stream processing engine written in Rust
tensorbase
-
ToyDB: Distributed SQL Database in Rust
+ The result of TB's architectural performance: the untuned write throughput of TB is ~ 2x faster than that of CH in the Rust driver bench, or ~70% faster by using CH own ```clickHouse-client``` command. Use [this parallel script](https://github.com/tensorbase/tools/blob/main/import_csv_to_...) to try it yourself!
3. Thanks to the Arrow-DataFusion, TensorBase has supported good parts of TPC-H. [Untuned TPC-H Q1 result here](https://github.com/tensorbase/benchmarks/blob/main/tpch.md).
4. In simple (no-groupby) aggregation, TensorBase is several times faster than ClickHouse. [Benchmark here](https://github.com/tensorbase/benchmarks/blob/main/quick.md).
5. For complex groupby aggregations, recently we help to boost the speed of the TB engine to the same level of ClickHouse(not released, but coming soon).
6. TB will soon supports MySQl wire protocol, distributed query, adaptive columnar storage optimization... Watch [issues here](https://github.com/tensorbase/tensorbase/issues)
Finally, it is really great to build an AP database in Rust. Welcome to join!
Disclaimer: I am the author of TensorBase.
What are some alternatives?
bytewax - Python Stream Processing
awesome-bigdata - A curated list of awesome big data frameworks, ressources and other awesomeness.
risingwave - SQL stream processing, analytics, and management. We decouple storage and compute to offer speedy bootstrapping, dynamic scaling, time-travel queries, and efficient joins.
tools
Benthos - Fancy stream processing made operationally mundane
benchmarks
cli - Railway CLI
gitplay - Learn how a software project (using git) evolved over time from its commit log. Its like YouTube for a git project. Desktop app built with Rust and SolidJS
feldera - Feldera Continuous Analytics Platform
toydb - Distributed SQL database in Rust, written as a learning project
timely-dataflow - A modular implementation of timely dataflow in Rust
naphtha - Universal database connection layer for your application in Rust. Implements the most common functions insert, update and remove for database connections. Change the database without having to adjust your code. Specific models can be stored in different databases. Query models by property. Migrations in pure Rust and available during runtime.