tikv
arrow-datafusion
Our great sponsors
tikv | arrow-datafusion | |
---|---|---|
18 | 43 | |
12,563 | 3,116 | |
1.4% | 6.0% | |
9.6 | 9.6 | |
5 days ago | 5 days ago | |
Rust | Rust | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
tikv
- Go devs that learned Rust, what are your thoughts on it?
-
Apache Pegasus – A a distributed key-value storage system
TiKV is basically a layer on top of rocksdb https://github.com/tikv/tikv/blob/956610725039835557e7516828...
-
Surrealdb – FOSS document-graph database, for the realtime web in Rust
> Many,many smart people…
If you look inside the code you can see the stated features are a result of underlying engine (TiKV [0] also in c and rust from pingcap). Surrealdb is standing on shoulders of giants at present, they are TiKV, FoundationDB and rocksdb. The feature set they mentioned mostly coming from TiKV at present.
-
Cloud database for tomorrow's applications (written in Rust)
Hi Diggsey, great question. We are currently focussed on functionality and stability, and then will draw our attention to performance. Coming this week we have a RocksDB storage implementation. We've only just launched our initial beta version, and we know there is a lot of improvement and work to be done (some of these performance issues we know about already and are on our Github issues list).
With regards to the consistency/isolation model, SurrealDB sits on top of a number of key-value stores. By using the distributed highly-available TiKV storage backend, https://tikv.org, (and we have a FoundationDB integration in the works), the database is designed to be highly-scalable and highly-available. The same guarantees (albeit just single-node, so no high-availability or scalability) will be available with the RocksDB implementation coming this week. By sitting on top of these key-value stores, SurrealDB ensures that all transactions are ACID compliant. We don't want to go for speed (for instance by writing to /dev/null) over anything, but want SurrealDB to be a reliable and performant backend for any application. Obviously we have a way to go to catch up with PostgreSQL (launched in 1996), but we will strive to get there!
-
CeresDB: A high-performance, distributed, schema-less and time-series database
If you are looking for a production ready distributed store written in Rust. Check out TiKV(https://github.com/tikv/tikv), which was also mentioned in the acknowledge section of the project's README.
There's also a full-featured distributed RDBMS called TiDB built on top of TiKV.
-
Fly.io – Free Postgres Databases (and free storage volumes, up to 3GB total)
Fair enough. Indeed I didn't consider support costs. Thank you for your answer!
Actually let me ask another thing. Your FAQ mentions you're considering hosting CockroachDB as a drop-in distributed replacement for PostgreSQL [0], and also you currently offer a distributed, eventually consistent PostgreSQL replication solution [1].
Is either Tikv [2] (distributed key-value store) or Tidb [3] (distributed database with a mysql interface, built on top of Tikv) on your radar?
You already offer Redis as a key-value store, but Tikv has an amazing property: it ensures strong consistency globally (not eventual consistency). Tidb, being built on top of Tikv, also has strong consistency.
[0] https://fly.io/blog/fly-answers-questions/#q-what-is-fly-doi...
[1] https://fly.io/blog/globally-distributed-postgres/
-
NoSQL and Key-Value storage systems based on Rust (Redis and Tarantool replacements in Rust)
tikv — A distributed KV database in Rust
-
Belajar Rust 01 - Mengenal Bahasa Pemrograman Rust
TiKV: basis data key-value transaksional yang terdistribusi.
-
Dive Deep into TiKV Transactions: The Life Story of a TiKV Prewrite Request
Before I introduce this phase, I'd like to talk about the batch system. It is the cornerstone of TiKV's multi-raft implementation.
TiKV is a distributed key-value storage engine, which is based on the designs of Google Spanner, F1, and HBase. However, TiKV is much simpler to manage because it does not depend on a distributed file system.
arrow-datafusion
- Using Rust to write a Data Pipeline. Thoughts. Musings.
-
Demystifying Apache Arrow
You can see some of the benchmarks in DataFusion (part of the Arrow project and built with Arrow as the underlying in-memory format) https://github.com/apache/arrow-datafusion/blob/master/bench...
Disclaimer: I'm a committer on the Arrow project and contributor to DataFusion.
-
Scala or Rust? which one will rule in future?
polars and datafusion seem very promising
-
Welcome to Comprehensive Rust
Rust has amazing integration with Python through PyO3 [1] so see it like a safe alternative for high performance calculations. The ecosystem itself is starting to come together exciting projects like Polars [2] (Pandas alternative), nalgebra [3], Datafusion [4] and Ballista [5]
[1] https://github.com/PyO3/pyo3
[2] https://github.com/pola-rs/polars/
[3] https://docs.rs/nalgebra/latest/nalgebra/
-
Command-line data analytics made easy
It could be the NDJSON parser (DF source: [0]) or could be a variety of other factors. Looking at the ROAPI release archive [1], it doesn't ship with the definitive `columnq` binary from your comment, so it could also have something to do with compilation-time flags.
FWIW, we use the Parquet format with DataFusion and get very good speeds similar to DuckDB [2], e.g. 1.5s to run a more complex aggregation query `SELECT date_trunc('month', tpep_pickup_datetime) AS month, COUNT(*) AS total_trips, SUM(total_amount) FROM tripdata GROUP BY 1 ORDER BY 1 ASC)` on a 55M row subset of NY Taxi trip data.
[0]: https://github.com/apache/arrow-datafusion/blob/master/dataf...
[1]: https://github.com/roapi/roapi/releases/tag/roapi-v0.8.0
SPyQL is really cool and its design is very smart, with it being able to leverage normal Python functions!
As far as similar tools go, I recommend taking a look at DataFusion[0], dsq[1], and OctoSQL[2].
DataFusion is a very (very very) fast command-line SQL engine but with limited support for data formats.
dsq is based on SQLite which means it has to load data into SQLite first, but then gives you the whole breath of SQLite, it also supports many data formats, but is slower at the same time.
OctoSQL is faster, extensible through plugins, and supports incremental query execution, so you can i.e. calculate a running group by + count while tailing a log file. It also supports normal databases, not just file formats, so you can i.e. join with a Postgres table.
[0]: https://github.com/apache/arrow-datafusion
[1]: https://github.com/multiprocessio/dsq
[2]: https://github.com/cube2222/octosql
Disclaimer: Author of OctoSQL
-
Welcome to InfluxDB IOx: InfluxData’s New Storage Engine
Just wanted to give a shout out to Apache DataFusion[0] that IOx relies on a lot (and contributes to as well!).
It's a framework for writing query engines in Rust that takes care of a lot of heavy lifting around parsing SQL, type casting, constructing and transforming query plans and optimizing them. It's pluggable, making it easy to write custom data sources, optimizer rules, query nodes etc.
It's has very good single-node performance (there's even a way to compile it with SIMD support) and Ballista [1] extends that to build it into a distributed query engine.
Plenty of other projects use it besides IOx, including VegaFusion, ROAPI, Cube.js's preaggregation store. We're heavily using it to build Seafowl [2], an analytical database that's optimized for running SQL queries directly from the user's browser (caching, CDNs, low latency, some WASM support, all that fun stuff).
[0] https://github.com/apache/arrow-datafusion
-
GlueSQL: A SQL database engine written as a library in Rust
Another "database toolkit" project that I've recently learned about is Apache DataFusion, also written in rust and uses Arrow memory format:
https://github.com/apache/arrow-datafusion/blob/master/READM...
-
Rust is showing a lot of promise in the DataFrame / tabular data space
[arrow-datafusion](https://github.com/apache/arrow-datafusion) is another great DataFrame library, especially if you like running SQL queries. It's so easy to query a Parquet / CSV dataset with SQL using DataFusion. I've run local benchmarks and it's super fast. The DataFusion docs are a bit lacking, which is a shame, for such a developed and amazing library. I hope to make these better and help spread the world about how truly amazing this lib is.
-
Steampipe – Select * from Cloud;
To add somewhat of a counterpoint to the other response, I've tried the Steampipe CSV plugin and got 50x slower performance vs OctoSQL[0], which is itself 5x slower than something like DataFusion[1]. The CSV plugin doesn't contact any external API's so it should be a good benchmark of the plugin architecture, though it might just not be optimized yet.
That said, I don't imagine this ever being a bottleneck for the main use case of Steampipe - in that case I think the APIs themselves will always be the limiting part. But it does - potentially - speak to what you can expect if you'd like to extend your usage of Steampipe to more than just DevOps data.
[0]: https://github.com/cube2222/octosql
[1]: https://github.com/apache/arrow-datafusion
Disclaimer: author of OctoSQL
What are some alternatives?
redis-rs - Redis library for rust
polars - Fast multi-threaded, hybrid-out-of-core DataFrame library in Rust | Python | Node.js
rust-etcd - An etcd client library for Rust.
ClickHouse - ClickHouse® is a free analytics DBMS for big data
rust-rocksdb - rust wrapper for rocksdb
cassandra-rs - Cassandra (CQL) driver for Rust, using the DataStax C/C++ driver under the covers.
db-benchmark - reproducible benchmark of database-like ops
diesel - A safe, extensible ORM and Query Builder for Rust
rust-postgres - Native PostgreSQL driver for the Rust programming language
cassandra-rust
KeyDB - A Multithreaded Fork of Redis
leveldb