kafka-delta-ingest
PyO3
kafka-delta-ingest | PyO3 | |
---|---|---|
6 | 147 | |
325 | 11,044 | |
3.4% | 2.3% | |
7.4 | 9.8 | |
18 days ago | 3 days ago | |
Rust | Rust | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
kafka-delta-ingest
-
Using rust for DE activities?
Rust can offer incredible cost savings when you can use it in place of spark to interact with your delta lake. One such project was kafka-delta-ingest. The developers were able to reduce the cost of running the pipeline by over 90%. However, most of this stuff is still very experimental and not ready for production but you will definitely be seeing more projects like this just based on how much money can be saved.
-
Which lakehouse table format do you expect your organization will be using by the end of 2023?
This independence from a catalog allows for path based reads and writes. This is handy when writing from Kafka directly to Delta Lake for the first layer of ingestion. You don’t need a catalog (or even Spark). https://github.com/delta-io/kafka-delta-ingest/tree/main/src
-
Streaming Data and Postgres
As far as I know no. You certainly could use events on a streaming ledger like Kafka or Redpanda and then store to delta with https://github.com/delta-io/kafka-delta-ingest and process them with all the gis goodness of spark. However, this is fairly complicated and much different from a simple postgis drop in replacement. There are specialized meaning faster and more efficient systems out there for specialized tasks such as geo fencing in real-time
-
Rust is showing a lot of promise in the DataFrame / tabular data space
kafka-delta-ingest is a good project to get streaming data into a Delta Lake. Here's a great talk on the topic.
-
process millions of events per sec
What about https://github.com/delta-io/kafka-delta-ingest?
- Exactly once delivery from Kafka to Delta Lake with Rust
PyO3
-
Encapsulation in Rust and Python
Integrating Rust into Python, Edward Wright, 2021-04-12 Examples for making rustpython run actual python code Calling Rust from Python using PyO3 Writing Python inside your Rust code — Part 1, 2020-04-17 RustPython, RustPython Rust for Python developers: Using Rust to optimize your Python code PyO3 (Rust bindings for Python) Musing About Pythonic Design Patterns In Rust, Teddy Rendahl, 2023-07-14
- Rust Bindings for the Python Interpreter
- Polars – A bird's eye view of Polars
-
In Rust for Python: A Match from Heaven
This story unfolds as a captivating journey where the agile Flounder, representing the Python programming language, navigates the vast seas of coding under the wise guidance of Sebastian, symbolizing Rust. Central to their adventure are three powerful tridents: cargo, PyO3, and maturin.
- Segunda linguagem
-
Calling Rust from Python
I would not recommend FFI + ctypes. Maintaining the bindings is tedious and error-prone. Also, Rust FFI/unsafe can be tricky even for experienced Rust devs.
Instead PyO3 [1] lets you "write a native Python module in Rust", and it works great. A much better choice IMO.
[1] https://github.com/PyO3/pyo3
-
Python 3.12
Same w/ Rust and Python, this is really neat because now each thread could have a GIL without doing exactly what you said. The pyO3 commit to allow subinterpreters was merged 21 days ago, so this might "just work" today: https://github.com/PyO3/pyo3/pull/3446
-
Removing Garbage Collection from the Rust Language (2013)
I expected someone to write a rust-based scripting language which tightly integrated with rust itself.
In reality, it seems like the python developers and toolchain are embracing rust enough to reduce the benefits to a new alternative.
https://github.com/PyO3/pyo3
-
Bytewax: Stream processing library built using Python and Rust
Hey HN! I am one of the people working on Bytewax. Bytewax came out of our experience working with ML infrastructure at GitHub. We wanted to use Python because we could move fast, the team was very fluent in it, and the rest of our tooling was Python-native already. We didn't want to introduce JVM-based solutions into our stack because of the lack of experience and the friction we had trying to get Python-centric tooling working with existing solutions like Flink.
In our research, we found Timely Dataflow (https://timelydataflow.github.io/timely-dataflow/, https://news.ycombinator.com/item?id=24837031) and the Naiad project (https://www.microsoft.com/en-us/research/project/naiad/) as well as PyO3 (https://github.com/PyO3/pyo3) and we thought we found a match made in heaven :). Bytewax leverages both of these projects and builds on them to provide a clean API (at least we think so) and table stakes features like connectors, state recovery, and cloud-native scaling. It has been really cool to learn about the dataflow computation model, Rust, and how to wrangle the GIL with Rust and Python :P.
Would love to get your feedback :).
`pip install bytewax` to get started. We have a page of guides (https://www.bytewax.io/guides) with ready-to-run examples.
-
Tell HN: Rust Is the Superglue
You can practice your Rust skills by writing performant and/or gluey extensions for higher-level language such as NodeJS (checkout napi-rs) and Python or complementing JS in the browser if you target Webassembly.
For instance, checkout Llama-node https://github.com/Atome-FE/llama-node for an involved Rust-based NodeJS extension. Python has PyO3, a Rust-Python extension toolset: https://github.com/PyO3/pyo3.
They can help you leverage your Rust for writing cool new stuff.
What are some alternatives?
delta-rs - A native Rust library for Delta Lake, with bindings into Python
rust-cpython - Rust <-> Python bindings
dipa - dipa makes it easy to efficiently delta encode large Rust data structures.
pybind11 - Seamless operability between C++11 and Python
kafka-rust - Rust client for Apache Kafka
RustPython - A Python Interpreter written in Rust
rust-rdkafka - A fully asynchronous, futures-based Kafka client library for Rust based on librdkafka
milksnake - A setuptools/wheel/cffi extension to embed a binary data in wheels
flowgger - A fast data collector in Rust
bincode - A binary encoder / decoder implementation in Rust.
arrow2 - Transmute-free Rust library to work with the Arrow format
uniffi-rs - a multi-language bindings generator for rust