Our great sponsors
|almost 5 years ago||3 days ago|
|MIT License||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
We haven't tracked posts mentioning utah yet.
Tracking mentions began in Dec 2020.
What are your favorite tools or components in the Kafka ecosystem?
10 projects | /r/apachekafka | 31 May 2023
A Python package for streaming synthetic data
2 projects | /r/Python | 25 May 2023
This is great, definitely see the utility here. I have had to hack this together so many times while building streaming workflows with github.com/bytewax/bytewax and other tools.
Snowflake - what are the streaming capabilities it provides?
3 projects | /r/dataengineering | 10 May 2023
When low latency matters you should always consider an ETL approach rather than ELT, e.g. collect data in Kafka and process using Kafka Streams/Flink in Java or Quix Streams/Bytewax in Python, then sink it to Snowflake where you can handle non-critical workloads (as is the case for 99% of BI/analytics). This way you can choose the right path for your data depending on how quickly it needs to be served.
Sunday Daily Thread: What's everyone working on this week?
3 projects | /r/Python | 6 May 2023
Working on how to use https://github.com/bytewax/bytewax to create embeddings in real-time for ML use cases. I want to make a small library for embedding pipelines, but still learning about vector dbs and the tradeoffs between the different solutions.
Arroyo: A distributed stream processing engine written in Rust
3 projects | /r/rust | 4 Apr 2023
Project looks cool! Glad you open sourced it. It could use some comments in the code base to help contributors ;). I also like the datafusion usage, that is awesome. BTW I work on github.com/bytewax/bytewax, which is based on https://github.com/TimelyDataflow/timely-dataflow another Rust dataflow computation engine.
Launch HN: BuildFlow (YC W23) – The FastAPI of data pipelines
3 projects | news.ycombinator.com | 15 Mar 2023
Cool, nice idea. Can you sub in different backend like bytewax (https://github.com/bytewax/bytewax) for stateful processing?
Kafka Stream Processing in Java or Scala
3 projects | /r/dataengineering | 24 Feb 2023
If you want to keep in your Python/SQL area of expertise and by all means I don't mean to promote not learning a new language, but just as an FYI. There are some non-Java/Scala tools between streaming databases like risingwave and materialize, streaming platforms like fluvio and redpanda, and stream processors like bytewax and faust.
ETL using pure python (no Pandas)
2 projects | /r/dataengineering | 24 Jan 2023
Also worth considering bytewax for streaming ingestion and transformation if you are focusing on Python.
Using Bytewax to build an anomaly detection app
2 projects | dev.to | 5 Oct 2022
Bytewax is an up-and-coming data processing framework that is built on top of Timely Dataflow, which is a cyclic dataflow computational model. At a high-level, dataflow programming is a programming paradigm where program execution is conceptualized as data flowing through a series of operator based steps. The Timely Dataflow library is written in Rust which makes it blazingly fast and easy to use due to the language's great Python bindings.
This Week in Python (July 15, 2022)
6 projects | dev.to | 15 Jul 2022
bytewax – A Python framework for building highly scalable dataflows
What are some alternatives?
rust-ndarray - ndarray: an N-dimensional array with array views, multidimensional slicing, and efficient operations
timely-dataflow - A modular implementation of timely dataflow in Rust
Bash-Oneliner - A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
mech - 🦾 Main repository for the Mech programming language. Start here!
awesome-web-scraping - List of libraries, tools and APIs for web scraping and data processing.
ux-dataflow - UX-Dataflow is a streaming capable data multiplexer that allows you to aggregate data and then process it using a Chain of Responsibility design pattern.