Stream Processing

Open-source projects categorized as Stream Processing

Top 23 Stream Processing Open-Source Projects

Stream Processing
  • mediapipe

    Cross-platform, customizable ML solutions for live and streaming media.

    Project mention: Mediapipe openpose Controlnet model for SD | /r/localdiffusion | 2023-11-15

    mediapipe/docs/solutions/pose.md at master · google/mediapipe · GitHub

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • vector

    A high-performance observability data pipeline.

    Project mention: Porting systemd to musl Libc-powered Linux | news.ycombinator.com | 2024-09-05

    These ones found no difference:

    http://www.etalabs.net/compare_libcs.html

    https://users.rust-lang.org/t/optimizing-rust-binaries-obser...

    This guy found Musl much slower for multithreaded allocation:

    https://www.linkedin.com/pulse/testing-alternative-c-memory-...

    These found Musl a bit slower with LTO:

    https://github.com/vectordotdev/vector/issues/2313

    Except for that LinkedIn one (which feels like it might be a bug), it seems like there isn't really much in it, which is what I'd expect tbh. Kind of like Clang vs GCC. Sometimes one is faster, but probably not by much.

  • awesome-bigdata

    A curated list of awesome big data frameworks, ressources and other awesomeness.

    Project mention: Top 20 Awesome on Github | dev.to | 2024-06-12

    12. Awesome Big Data

  • redpanda

    Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!

    Project mention: AutoMQ Integration with Redpanda Console | dev.to | 2024-07-29

    References [1] Redpanda Console: https://redpanda.com/redpanda-console-kafka-ui [2] Redpanda: https://redpanda.com/ [3] Cluster Deployment of AutoMQ: https://docs.automq.com/zh/docs/automq-opensource/IyXrw3lHriVPdQkQLDvcPGQdnNh [4] Quick Start: https://github.com/redpanda-data/console?tab=readme-ov-file#quick-start [5] Release Redpanda Console: https://github.com/redpanda-data/console/releases/tag/v2.6.0 [6] Redpanda Console Configuration: https://docs.redpanda.com/current/reference/console/config/#example-redpanda-console-configuration-file [7] Kafdrop Github: https://github.com/obsidiandynamics/kafdrop

  • awesome-system-design

    A curated list of awesome System Design (A.K.A. Distributed Systems) resources.

    Project mention: Ask HN: Resources to learn boring architecture for a small startup? | news.ycombinator.com | 2023-12-25

    https://github.com/madd86/awesome-system-design

  • connect

    Fancy stream processing made operationally mundane (by redpanda-data)

    Project mention: connect VS goka - a user suggested alternative | libhunt.com/r/redpanda-data/connect | 2024-07-23
  • watermill

    Building event-driven applications the easy way in Go.

    Project mention: Watermill -- a Go library for working efficiently with message streams | news.ycombinator.com | 2024-10-03
  • risingwave

    Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.

    Project mention: RisingWave: Process, manage, and analyze event streams with Postgres-style SQL | news.ycombinator.com | 2024-07-18
  • Faust

    Python Stream Processing

    Project mention: Faust VS quix-streams - a user suggested alternative | libhunt.com/r/faust | 2023-12-07
  • Hazelcast

    Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on data-in-motion for real-time insights.

  • fluent-bit

    Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows

    Project mention: Data on Kubernetes: Part 4 - Argo Workflows: Simplify parallel jobs : Container-native workflow engine for Kubernetes 🔮 | dev.to | 2024-07-28

    In this section, we'll dive into creating and deploying a data processing platform on Amazon Elastic Kubernetes Service Amazon EKS.   The solution includes essential Kubernetes add-ons: Argo Workflows, Argo Events, Spark Operator for managing Spark jobs, Fluent Bit for logging, and Prometheus for metrics.

  • materialize

    The Cloud Operational Data Store: use SQL to transform, deliver, and act on fast-changing data. (by MaterializeInc)

    Project mention: Rama on Clojure's terms, and the magic of continuation-passing style | news.ycombinator.com | 2024-10-14
  • hudi

    Upserts, Deletes And Incremental Processing on Big Data.

    Project mention: Shades of Open Source - Understanding The Many Meanings of "Open" | dev.to | 2024-06-15

    In the world of table formats, there are three competing standards: Apache Iceberg, Apache Hudi, and Delta Lake, with two out of the three being Apache projects (and there is also Apache XTable for interoperability between these and future formats). For catalogs, options include Nessie, Gravitino, Polaris, and Unity Catalog, all of which are open source but not yet Apache projects.

  • river

    🌊 Online machine learning in Python

    Project mention: River: Online Machine Learning in Python | news.ycombinator.com | 2024-05-12
  • danfojs

    Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.

    Project mention: How to Work with Multidimensional Arrays in JavaScript | dev.to | 2024-09-18

    Website: Danfo.js

  • pathway

    Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

    Project mention: Show HN: Pathway – Build Mission Critical ETL and RAG in Python (NATO, F1 Used) | news.ycombinator.com | 2024-06-13

    The main factor impacting the RAM requirement of the instance is the size of the data that you feed into it, especially if you need an in-memory index. (If you are curious about peak memory use etc., you can profile Pathway memory use in Grafana: https://github.com/pathwaycom/pathway/tree/main/examples/pro....)

    One point to clarify is that "Pathway Community" is self-hosted, and the "8GB RAM - 4 cores" value is just a limit on the dimension of your own/cloud machine that the framework will effectively use. Currently, if you would like to get a "free" cloud machine to go with your project, we suggest going for "Pathway Scale" and reaching out through the #Developer Assist link - add a mention that you are interested in cloud credits. You can also go with 3rd party hosting providers like http://render.com/ who have a (somewhat modest) free tier for Docker instances, or reasonably priced ones like fly.io https://fly.io/docs/about/pricing/.

  • fluvio

    Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.

    Project mention: Yes, It's easy to build a Fluvio connector in Rust. | dev.to | 2024-09-08

    [dependencies] futures = { version = "0.3", default-features = false } serde = { version = "1.0", default-features = false, features = ["derive"] } serde_json = { version = "1", default-features = false } anyhow = { version = "1.0" } async-std = { version = "1.8", default-features = false, features = [ "attributes", "tokio1", ] } async-trait = { version = "0.1", default-features = false } fluvio = { git = "https://github.com/infinyon/fluvio", rev = "98cfc21314c93d4c2898edc9e2160f280622be21" } fluvio-connector-common = { git = "https://github.com/infinyon/fluvio", rev = "98cfc21314c93d4c2898edc9e2160f280622be21", features = [ "derive", ] } humantime = "2.1.0" google-sheets4 = "*"

  • arroyo

    Distributed stream processing engine in Rust

    Project mention: FLaNK AI Weekly 18 March 2024 | dev.to | 2024-03-18
  • awesome-streaming

    a curated list of awesome streaming frameworks, applications, etc

    Project mention: Streaming Processing | news.ycombinator.com | 2024-05-28
  • PipelineDB

    High-performance time-series aggregation for PostgreSQL

    Project mention: PostgreSQL Is Enough | news.ycombinator.com | 2024-02-06
  • faststream

    FastStream is a powerful and easy-to-use Python framework for building asynchronous services interacting with event streams such as Apache Kafka, RabbitMQ, NATS and Redis.

    Project mention: FastStream v0.4.0: Introducing Confluent Kafka Integration with Async Support | news.ycombinator.com | 2024-01-30
  • Memgraph

    Open-source graph database, tuned for dynamic analytics environments. Easy to adopt, scale and own.

    Project mention: List of 45 databases in the world | dev.to | 2024-07-09

    Memgraph — Real-time graph database for streaming data.

  • peerdb

    Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage

    Project mention: The 4-chan Go programmer | news.ycombinator.com | 2024-08-28

    I did end up using a `chan chan` when implementing a threadpool: https://github.com/PeerDB-io/peerdb/pull/1613/files#diff-427...

    The inner channel represents a future, while the outer channel has the threadpool reading a stream of futures. This way the ordering doesn't get corrupted by parallelism

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Stream Processing discussion

Log in or Post with

Stream Processing related posts

Index

What are some of the best open-source Stream Processing projects? This list will help you:

Project Stars
1 mediapipe 27,046
2 vector 17,631
3 awesome-bigdata 13,186
4 redpanda 9,538
5 awesome-system-design 9,533
6 connect 8,111
7 watermill 7,417
8 risingwave 6,903
9 Faust 6,729
10 Hazelcast 6,116
11 fluent-bit 5,803
12 materialize 5,739
13 hudi 5,355
14 river 5,042
15 danfojs 4,773
16 pathway 3,982
17 fluvio 3,794
18 arroyo 3,687
19 awesome-streaming 2,680
20 PipelineDB 2,631
21 faststream 2,526
22 Memgraph 2,373
23 peerdb 2,181

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you konow that Rust is
the 5th most popular programming language
based on number of metions?