lightflus
Daft
lightflus | Daft | |
---|---|---|
1 | 9 | |
95 | 1,761 | |
- | 8.0% | |
5.8 | 9.8 | |
about 1 year ago | 4 days ago | |
Rust | Rust | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
lightflus
Daft
-
Daft: Distributed DataFrame for Python
There are benchmarks here - https://github.com/Eventual-Inc/Daft?tab=readme-ov-file#benc.... Seems to outperform Dask by a fair bit.
-
Daft: A High-Performance Distributed Dataframe Library for Multimodal Data
Hi (one of the maintainers here), that is a good suggestion! I wasn't aware of that project. I went ahead and made an issue to add `export DO_NOT_TRACK=1` as one of the variables we track! https://github.com/Eventual-Inc/Daft/issues/1015
-
Daft: The Distributed Python Dataframe
We are looking at supporting other distributed backends as well - please drop by our discussion forums (https://github.com/Eventual-Inc/Daft/discussions) and drop us a message if you have any suggestions! We’d love to hear from you :)
What are some alternatives?
dora - Dataflow-Oriented Robotic Application is middleware that streamlines and simplifies the creation of AI-based robotic applications with low latency, composable, and distributed dataflow.
polars - Dataframes powered by a multithreaded, vectorized query engine, written in Rust
zenoh - zenoh unifies data in motion, data in-use, data at rest and computations. It carefully blends traditional pub/sub with geo-distributed storages, queries and computations, while retaining a level of time and space efficiency that is well beyond any of the mainstream stacks.
xvc - A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)
datafuse - An elastic and reliable Cloud Warehouse, offers Blazing Fast Query and combines Elasticity, Simplicity, Low cost of the Cloud, built to make the Data Cloud easy [Moved to: https://github.com/datafuselabs/databend]
hamilton - A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
moon - A task runner and repo management tool for the web ecosystem, written in Rust.
deeplake - Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
eszip - A compact file format to losslessly serialize an ECMAScript module graph into a single file
quokka - Making data lake work for time series
flowistry - Flowistry is an IDE plugin for Rust that helps you focus on relevant code.
hamilton - Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.