SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Stream Processing Open-Source Projects
-
mediapipe/docs/solutions/pose.md at master · google/mediapipe · GitHub
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
These ones found no difference:
http://www.etalabs.net/compare_libcs.html
https://users.rust-lang.org/t/optimizing-rust-binaries-obser...
This guy found Musl much slower for multithreaded allocation:
https://www.linkedin.com/pulse/testing-alternative-c-memory-...
These found Musl a bit slower with LTO:
https://github.com/vectordotdev/vector/issues/2313
Except for that LinkedIn one (which feels like it might be a bug), it seems like there isn't really much in it, which is what I'd expect tbh. Kind of like Clang vs GCC. Sometimes one is faster, but probably not by much.
-
12. Awesome Big Data
-
redpanda
Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
References [1] Redpanda Console: https://redpanda.com/redpanda-console-kafka-ui [2] Redpanda: https://redpanda.com/ [3] Cluster Deployment of AutoMQ: https://docs.automq.com/zh/docs/automq-opensource/IyXrw3lHriVPdQkQLDvcPGQdnNh [4] Quick Start: https://github.com/redpanda-data/console?tab=readme-ov-file#quick-start [5] Release Redpanda Console: https://github.com/redpanda-data/console/releases/tag/v2.6.0 [6] Redpanda Console Configuration: https://docs.redpanda.com/current/reference/console/config/#example-redpanda-console-configuration-file [7] Kafdrop Github: https://github.com/obsidiandynamics/kafdrop
-
awesome-system-design
A curated list of awesome System Design (A.K.A. Distributed Systems) resources.
Project mention: Ask HN: Resources to learn boring architecture for a small startup? | news.ycombinator.com | 2023-12-25https://github.com/madd86/awesome-system-design
-
Project mention: connect VS goka - a user suggested alternative | libhunt.com/r/redpanda-data/connect | 2024-07-23
-
Project mention: Watermill -- a Go library for working efficiently with message streams | news.ycombinator.com | 2024-10-03
-
risingwave
Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
Project mention: RisingWave: Process, manage, and analyze event streams with Postgres-style SQL | news.ycombinator.com | 2024-07-18 -
Project mention: Faust VS quix-streams - a user suggested alternative | libhunt.com/r/faust | 2023-12-07
-
Hazelcast
Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on data-in-motion for real-time insights.
-
Project mention: Data on Kubernetes: Part 4 - Argo Workflows: Simplify parallel jobs : Container-native workflow engine for Kubernetes 🔮 | dev.to | 2024-07-28
In this section, we'll dive into creating and deploying a data processing platform on Amazon Elastic Kubernetes Service Amazon EKS. The solution includes essential Kubernetes add-ons: Argo Workflows, Argo Events, Spark Operator for managing Spark jobs, Fluent Bit for logging, and Prometheus for metrics.
-
materialize
The Cloud Operational Data Store: use SQL to transform, deliver, and act on fast-changing data. (by MaterializeInc)
Project mention: Rama on Clojure's terms, and the magic of continuation-passing style | news.ycombinator.com | 2024-10-14 -
Project mention: Shades of Open Source - Understanding The Many Meanings of "Open" | dev.to | 2024-06-15
In the world of table formats, there are three competing standards: Apache Iceberg, Apache Hudi, and Delta Lake, with two out of the three being Apache projects (and there is also Apache XTable for interoperability between these and future formats). For catalogs, options include Nessie, Gravitino, Polaris, and Unity Catalog, all of which are open source but not yet Apache projects.
-
-
danfojs
Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
Website: Danfo.js
-
Project mention: Show HN: Pathway – Build Mission Critical ETL and RAG in Python (NATO, F1 Used) | news.ycombinator.com | 2024-06-13
The main factor impacting the RAM requirement of the instance is the size of the data that you feed into it, especially if you need an in-memory index. (If you are curious about peak memory use etc., you can profile Pathway memory use in Grafana: https://github.com/pathwaycom/pathway/tree/main/examples/pro....)
One point to clarify is that "Pathway Community" is self-hosted, and the "8GB RAM - 4 cores" value is just a limit on the dimension of your own/cloud machine that the framework will effectively use. Currently, if you would like to get a "free" cloud machine to go with your project, we suggest going for "Pathway Scale" and reaching out through the #Developer Assist link - add a mention that you are interested in cloud credits. You can also go with 3rd party hosting providers like http://render.com/ who have a (somewhat modest) free tier for Docker instances, or reasonably priced ones like fly.io https://fly.io/docs/about/pricing/.
-
fluvio
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
[dependencies] futures = { version = "0.3", default-features = false } serde = { version = "1.0", default-features = false, features = ["derive"] } serde_json = { version = "1", default-features = false } anyhow = { version = "1.0" } async-std = { version = "1.8", default-features = false, features = [ "attributes", "tokio1", ] } async-trait = { version = "0.1", default-features = false } fluvio = { git = "https://github.com/infinyon/fluvio", rev = "98cfc21314c93d4c2898edc9e2160f280622be21" } fluvio-connector-common = { git = "https://github.com/infinyon/fluvio", rev = "98cfc21314c93d4c2898edc9e2160f280622be21", features = [ "derive", ] } humantime = "2.1.0" google-sheets4 = "*"
-
-
-
-
faststream
FastStream is a powerful and easy-to-use Python framework for building asynchronous services interacting with event streams such as Apache Kafka, RabbitMQ, NATS and Redis.
Project mention: FastStream v0.4.0: Introducing Confluent Kafka Integration with Async Support | news.ycombinator.com | 2024-01-30 -
Memgraph
Open-source graph database, tuned for dynamic analytics environments. Easy to adopt, scale and own.
Memgraph — Real-time graph database for streaming data.
-
peerdb
Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage
I did end up using a `chan chan` when implementing a threadpool: https://github.com/PeerDB-io/peerdb/pull/1613/files#diff-427...
The inner channel represents a future, while the outer channel has the threadpool reading a stream of futures. This way the ordering doesn't get corrupted by parallelism
Stream Processing discussion
Stream Processing related posts
-
Feldera Incremental Compute Engine
-
RisingWave: Process, manage, and analyze event streams with Postgres-style SQL
-
Bento, the open source fork of the project formerly known as Benthos
-
Streaming Processing
-
Benthos – Fancy stream processing made operationally mundane
-
Show HN: Streaming DataFrames–a Pandas-like syntax for real-time data
-
Building a streaming SQL engine with Arrow and DataFusion
-
A note from our sponsor - SaaSHub
www.saashub.com | 15 Oct 2024
Index
What are some of the best open-source Stream Processing projects? This list will help you:
Project | Stars | |
---|---|---|
1 | mediapipe | 27,046 |
2 | vector | 17,631 |
3 | awesome-bigdata | 13,186 |
4 | redpanda | 9,538 |
5 | awesome-system-design | 9,533 |
6 | connect | 8,111 |
7 | watermill | 7,417 |
8 | risingwave | 6,903 |
9 | Faust | 6,729 |
10 | Hazelcast | 6,116 |
11 | fluent-bit | 5,803 |
12 | materialize | 5,739 |
13 | hudi | 5,355 |
14 | river | 5,042 |
15 | danfojs | 4,773 |
16 | pathway | 3,982 |
17 | fluvio | 3,794 |
18 | arroyo | 3,687 |
19 | awesome-streaming | 2,680 |
20 | PipelineDB | 2,631 |
21 | faststream | 2,526 |
22 | Memgraph | 2,373 |
23 | peerdb | 2,181 |