Top 23 Stream Processing Open-Source Projects

mediapipe

49 25,405 9.9 C++

Cross-platform, customizable ML solutions for live and streaming media.

Project mention: Mediapipe openpose Controlnet model for SD | /r/localdiffusion | 2023-11-15

mediapipe/docs/solutions/pose.md at master · google/mediapipe · GitHub

vector

96 16,512 9.9 Rust

A high-performance observability data pipeline.

Project mention: Docker Log Observability: Analyzing Container Logs in HashiCorp Nomad with Vector, Loki, and Grafana | dev.to | 2024-04-19

job "vector" { datacenters = ["dc1"] # system job, runs on all nodes type = "system" group "vector" { count = 1 network { port "api" { to = 8686 } } ephemeral_disk { size = 500 sticky = true } task "vector" { driver = "docker" config { image = "timberio/vector:0.30.0-debian" ports = ["api"] volumes = ["/var/run/docker.sock:/var/run/docker.sock"] } env { VECTOR_CONFIG = "local/vector.toml" VECTOR_REQUIRE_HEALTHY = "false" } resources { cpu = 100 # 100 MHz memory = 100 # 100MB } # template with Vector's configuration template { destination = "local/vector.toml" change_mode = "signal" change_signal = "SIGHUP" # overriding the delimiters to [[ ]] to avoid conflicts with Vector's native templating, which also uses {{ }} left_delimiter = "[[" right_delimiter = "]]" data=<

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
awesome-bigdata

3 12,792 1.5

A curated list of awesome big data frameworks, ressources and other awesomeness.

Project mention: Good coding groups for black women? | news.ycombinator.com | 2024-01-13

redpanda

69 8,784 10.0 C++

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!

Project mention: Choosing Between a Streaming Database and a Stream Processing Framework in Python | dev.to | 2024-02-10

Stream-processing platforms such as Apache Kafka, Apache Pulsar, or Redpanda are specifically engineered to foster event-driven communication in a distributed system and they can be a great choice for developing loosely coupled applications. Stream processing platforms analyze data in motion, offering near-zero latency advantages. For example, consider an alert system for monitoring factory equipment. If a machine's temperature exceeds a certain threshold, a streaming platform can instantly trigger an alert and engineers do timely maintenance.

awesome-system-design

14 8,297 2.2

A curated list of awesome System Design (A.K.A. Distributed Systems) resources.

Project mention: Ask HN: Resources to learn boring architecture for a small startup? | news.ycombinator.com | 2023-12-25

https://github.com/madd86/awesome-system-design

Benthos

76 7,559 9.6 Go

Fancy stream processing made operationally mundane

Project mention: Ask HN: Who is hiring? (December 2023) | news.ycombinator.com | 2023-12-01

watermill

23 6,729 6.5 Go

Building event-driven applications the easy way in Go.

Project mention: Microservices communication | /r/golang | 2023-12-09

I’ve successfully worked on projects using an asynchronous event-driven way of connecting services. I really like the decoupling of business logic and the events triggering it. I highly recommend https://github.com/ThreeDotsLabs/watermill to be more flexible when it comes to choosing the actual technology driving the async patter. It might be NATS today but requirements might change and you need to change. Watermill prepares you for this.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Faust

8 6,674 1.4 Python

Python Stream Processing

Project mention: Faust VS quix-streams - a user suggested alternative | libhunt.com/r/faust | 2023-12-07

risingwave

27 6,283 10.0 Rust

Cloud-native SQL stream processing, analytics, and management. KsqlDB and Apache Flink alternative. 🚀 10x more productive. 🚀 10x more cost-efficient.

Project mention: Proton, a fast and lightweight alternative to Apache Flink | news.ycombinator.com | 2024-01-30

How does this compare to RisingWave and Materialize?
https://github.com/risingwavelabs/risingwave

Hazelcast

7 5,861 9.9 Java

Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on data-in-motion for real-time insights.

Project mention: Does anyone know any good java implementations for distributed key-value store? | /r/ExperiencedDevs | 2023-06-08

You're probably looking for Hazelcast here. Note that it does much more than just a distributed k/v, but it will get you where you need to go.

ksql

4 5,817 10.0 Java

The database purpose-built for stream processing applications.
materialize

117 5,567 10.0 Rust

The data warehouse for operational workloads. (by MaterializeInc)

Project mention: Ask HN: How Can I Make My Front End React to Database Changes in Real-Time? | news.ycombinator.com | 2024-04-17

[2] https://materialize.com/

fluent-bit

35 5,344 9.8 C

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows

Project mention: Observability at KubeCon + CloudNativeCon Europe 2024 in Paris | dev.to | 2024-03-26

Fluentbit

hudi

20 5,066 9.9 Java

Upserts, Deletes And Incremental Processing on Big Data.

Project mention: Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog | dev.to | 2023-12-18

Apache Iceberg is one of the three types of lakehouse, the other two are Apache Hudi and Delta Lake.

river

17 4,766 9.2 Python

🌊 Online machine learning in Python

Project mention: 🔍Underrated Open Source Projects You Should Know About 🧠 | dev.to | 2024-03-20

River is a Python library for online machine learning. Online machine learning can dynamically adapt to new patterns in the data, or when the data itself is generated as a function of time, e.g., stock price prediction, content personalization.

danfojs

2 4,649 0.6 TypeScript

Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
arroyo

13 3,275 9.6 Rust

Distributed stream processing engine in Rust

Project mention: FLaNK AI Weekly 18 March 2024 | dev.to | 2024-03-18

dpark

0 2,691 0.0 Python

Python clone of Spark, a MapReduce alike framework in Python
fluvio

26 2,638 9.5 Rust

Lean and mean distributed stream processing system written in rust and web assembly.

Project mention: Ask HN: WebSocket Relay? | news.ycombinator.com | 2024-02-27

PipelineDB

3 2,603 0.0 C

High-performance time-series aggregation for PostgreSQL

Project mention: PostgreSQL Is Enough | news.ycombinator.com | 2024-02-06

awesome-streaming

0 2,557 5.2

a curated list of awesome streaming frameworks, applications, etc
Memgraph

44 2,086 9.7 C++

Open-source graph database, tuned for dynamic analytics environments. Easy to adopt, scale and own.

Project mention: Ask HN: Who is hiring? (March 2024) | news.ycombinator.com | 2024-03-01

Memgraph | Staff C++ Database Engineer | REMOTE (Central/Western Europe, LatAm, or North America) https://memgraph.com/
Memgraph is a Seed stage, open source graph database vendor. Graph DBs are a great solution for GenAI, logistics, cybersecurity and fintech so we are looking to grow aggressively this year.
We're looking for a staff-level engineer to set technical direction, mentor junior team members, and solve some very difficult problems.
Either DM me (the hiring manager) or apply here: https://join.com/companies/memgraph/10684850-staff-software-...

go-streams

9 1,753 6.6 Go

A lightweight stream processing library for Go
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Stream Processing related posts

Show HN: Streaming DataFrames–a Pandas-like syntax for real-time data
1 project | news.ycombinator.com | 23 Apr 2024
Building a streaming SQL engine with Arrow and DataFusion
1 project | news.ycombinator.com | 18 Mar 2024
FLaNK AI Weekly 18 March 2024
39 projects | dev.to | 18 Mar 2024
Proton, a fast and lightweight alternative to Apache Flink
7 projects | news.ycombinator.com | 30 Jan 2024
Airflow VS quix-streams - a user suggested alternative
2 projects | 7 Dec 2023
Apache Pulsar VS quix-streams - a user suggested alternative
2 projects | 7 Dec 2023
redpanda VS quix-streams - a user suggested alternative
2 projects | 7 Dec 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 27 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Stream Processing projects? This list will help you:

	Project	Stars
1	mediapipe	25,405
2	vector	16,512
3	awesome-bigdata	12,792
4	redpanda	8,784
5	awesome-system-design	8,297
6	Benthos	7,559
7	watermill	6,729
8	Faust	6,674
9	risingwave	6,283
10	Hazelcast	5,861
11	ksql	5,817
12	materialize	5,567
13	fluent-bit	5,344
14	hudi	5,066
15	river	4,766
16	danfojs	4,649
17	arroyo	3,275
18	dpark	2,691
19	fluvio	2,638
20	PipelineDB	2,603
21	awesome-streaming	2,557
22	Memgraph	2,086
23	go-streams	1,753