delta-rs
Mage
Our great sponsors
delta-rs | Mage | |
---|---|---|
28 | 77 | |
1,820 | 7,001 | |
6.1% | 5.6% | |
9.7 | 9.9 | |
1 day ago | 3 days ago | |
Rust | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
delta-rs
- Delta-rs ā a Rust-based implementation of deltalake
-
Delta Lake vs. Parquet: A Comparison
I work at Databricks, but am pretty must just an OSS nerd, mainly focusing on Delta Rust recently: https://github.com/delta-io/delta-rs
I did some keyword research and wrote this post cause lots of folks are doing searches for Delta Lake vs Parquet. I'm just trying to share a fair summary of the tradeoffs with folks who are doing this search. It's a popular post and that's why I figured I would share it here.
-
Working with Rust
Seeing a lot of great libraries coming out with python bindings in the data world e.g delta-rs Polars. I see it growing in this space as a C++ alternative
-
Ideas/Suggestions around setting up a data pipeline from scratch
If Iām not misunderstanding, you could both decode the gRPC protobuf AND write to delta lake in Rust. Tonic, Delta-rs.
-
Delta-rs with upserts
https://github.com/delta-io/delta-rs/issues/850 ā¦ looks like itās on the roadmap!
-
Read and filter delta files on Azure from a .net application
Microsoft talk a lot about OneLake and that the delta file format will be the standard during the build conference. Is it only me that find it strange that their marketing team talks so much about the delta format when they do not even provide a library to work with the delta format from .net? It would be easy for them to maintain bindings to https://github.com/delta-io/delta-rs but also provide a reader that support V-Order https://learn.microsoft.com/en-us/fabric/data-engineering/delta-optimization-and-v-order?tabs=sparksql
-
Polars query engine 0.29.0 released
I know someone will be adding this on the python side in the coming weeks. On the rust side you can use delta-rs with polars. Though you would be compiling both arrow2 and arrow-rs, so that's quite heavy.
-
Delta Lake without Databricks?
You donāt need DBX to use Delta Lake. You can use S3 as the backend and just use the Python Delta Lake library. It works great! https://github.com/delta-io/delta-rs
-
Seeking Recommendations for a Master Data Management Tool
Maybe if I get some free time soon I can formalize into a working example. Been wanting an excuse to try similar concept in delta-rs and polars/duckdb vs databricks/spark vs iceberg/polars.
-
Opportunity to contribute to a popular Rust data project (delta-rs)
delta-rs is a native Rust library for Delta Lake. It's a better way to store data than Parquet files and is fundamentally important library for the Rust data ecosystem. It's tightly integrated with Polars and Datafusion and there is a lot of interesting Rust work to be done.
Mage
- FLaNK AI-April 22,Ā 2024
-
A mage on the Heroās Journey: a fantasy epic on how a startup rose from the ashes
In the coming years, Mage will create a cooperative experience so that developers can build data pipelines with their team and level up together. After that journey, Mage will go on an epic quest to create the 1st open world community experience in the data universe.
-
Data sources episode 2: AWS S3 to Postgres Data Sync using Singer
Link to original blog: https://www.mage.ai/blog/data-sources-ep-2-aws-s3-to-postgres-data-sync-using-singer
-
What are some open-source ML pipeline managers that are easy to use?
I would recommend the following: - https://www.mage.ai/ - https://dagster.io/ - https://www.prefect.io/ - https://metaflow.org/ - https://zenml.io/home
-
Mage Battlegrounds: Craft insights from real-time customer behavior analysis
You're invited to participate in the very first Mage Battlegrounds: Craft insights from real-time customer behavior analysis, a 24-hour virtual hackathon hosted by Shashank Mishra! This data engineering competition will take place on Saturday, April 15, 2023 beginning at 11am (PST). This will be a global event open to all participants who register.
-
Looking for an open-source project
Try this feature: https://github.com/mage-ai/mage-ai/issues/1166
-
Daskqueue: Dask-based distributed task queue
Seeing if we can use it in https://github.com/mage-ai/mage-ai
-
Data Pipeline on a Shoestring
That being said thereās a solid family of services just breaking ground that make the local pipeline deployment easier (check out https://www.mage.ai, which does have a clear path to cloud deployment of locally developed pipes, it just isnāt well documented yet, and also https://www.neuronsphere.io - which doesnāt have a public solution YET (theyāre internally testing an alpha) but they built a cloud deployable solution for their paying customers and working to release one for freemium use)
-
Trending ML repos of the week š
7ļøā£ mage-ai/mage-ai
-
Delta without using Spark
Yes, check out how Mage does it: https://github.com/mage-ai/mage-ai/tree/master/mage_integrations/mage_integrations/destinations/delta_lake_s3
What are some alternatives?
delta - An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
dagster - An orchestration platform for the development, production, and observation of data assets.
roapi - Create full-fledged APIs for slowly moving datasets without writing a single line of code.
vscode-dvc - Machine learning experiment tracking and data versioning with DVC extension for VS Code
materialize - The data warehouse for operational workloads.
sqlmesh - Efficient data transformation and modeling framework that is backwards compatible with dbt.
ballista - Distributed compute platform implemented in Rust, and powered by Apache Arrow.
mito - The mitosheet package, trymito.io, and other public Mito code.
kafka-delta-ingest - A highly efficient daemon for streaming data from Kafka into Delta Lake
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
delta-oss
Data-Science-Roadmap - Data Science Roadmap from A to Z