Top 23 Dataflow Open-Source Projects

Drawflow

7 4,126 3.7 JavaScript

Simple flow library 🖥️🖱️

Project mention: How to make beautiful flowchart with Angular ? | /r/Frontend | 2023-07-27

❌📄Drawflow - Seems nice, but no docs, and last commit was a year ago

umbrella

4 3,205 9.9 TypeScript

⛱ Broadly scoped ecosystem & mono-repository of 190 TypeScript projects (and 155 examples) for general purpose, functional, data driven development
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
nextflow

9 2,538 9.7 Groovy

A DSL for data-driven computational pipelines

Project mention: Nextflow: Data-Driven Computational Pipelines | news.ycombinator.com | 2023-08-10

> It's been a while since you can rerun/resume Nextflow pipelines
Yes, you can resume, but you need your whole upstream DAG to be present. Snakemake can rerun a job when only the dependencies of that job are present, which allows to neatly manage the disk usage, or archive an intermediate state of a project and rerun things from there.
> and yes, you can have dry runs in Nextflow
You have stubs, which really isn't the same thing.
> I have no idea what you're referring to with the 'arbitrary limit of 1000 parallel jobs' though
I was referring to this issue: https://github.com/nextflow-io/nextflow/issues/1871. Except, the discussion doesn't give the issue a full justice. Nextflow spans each job in a separate thread, and when it tries to span 1000+ condor jobs it die with a cryptic error message. The option of -Dnxf.pool.type=sync and -Dnxf.pool.maxThreads=N prevents the ability to resume and attempts to rerun the pipeline.
> As for deleting temporary files, there are features that allow you to do a few things related to that, and other features being implemented.
There are some hacks for this - but nothing I would feel safe to integrate into a production tool. They are implementing something - you're right - and it's been the case for several years now, so we'll see.
Snakemake has all that out of the box.

Scio

7 2,520 9.6 Scala

A Scala API for Apache Beam and Google Cloud Dataflow.
pyt

2 2,161 0.0 Python

A Static Analysis Tool for Detecting Security Vulnerabilities in Python Web Applications
flowistry

15 1,815 7.3 Rust

Flowistry is an IDE plugin for Rust that helps you focus on relevant code.

Project mention: An IDE plugin for Rust that helps you focus on relevant code | news.ycombinator.com | 2023-12-14

bearer

18 1,736 9.6 Go

Code security scanning tool (SAST) to discover, filter and prioritize security and privacy risks.

Project mention: Show HN: Bearer Code Security Scanner Add Support for Java, PHP, Go, and Python | news.ycombinator.com | 2023-10-26

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
baklavajs

1 1,360 9.0 TypeScript

Graph / node editor in the browser using VueJS
bytewax

18 1,144 9.8 Python

Python Stream Processing

Project mention: Building a streaming SQL engine with Arrow and DataFusion | news.ycombinator.com | 2024-03-18

ipyflow

20 1,073 9.5 Python

A reactive Python kernel for Jupyter notebooks.

Project mention: Show HN: Marimo – an open-source reactive notebook for Python | news.ycombinator.com | 2024-01-12

You're probably referring to nbgather (https://github.com/microsoft/gather), which shipped with VSCode for a while.
nbgather used static slicing to get all the code necessary to reconstruct some cell. I actually worked with Andrew Head (original nbgather author) and Shreya Shankar to implement something similar in ipyflow (but with dynamic slicing and a not-as-nice interface): https://github.com/ipyflow/ipyflow?tab=readme-ov-file#state-...
I have no doubt something like this will make its way into marimo's roadmap at some point :)

scipipe

1 1,054 3.0 Go

Robust, flexible and resource-efficient pipelines using Go and the commandline
dora

2 998 9.7 Rust

low latency, composable, and distributed dataflow for AI and robotic application (by dora-rs)

Project mention: Dora: Low latency, composable, and distributed dataflow for AI and robotic | news.ycombinator.com | 2024-03-21

RaftLib

0 923 5.7 C++

The RaftLib C++ library, streaming/dataflow concurrency via C++ iostream-like operators
streamly

8 847 9.7 Haskell

High performance, concurrent functional programming abstractions

Project mention: [ANN] Haskell Streamly 0.9.0 Release! | /r/haskell | 2023-05-25

https://github.com/composewell/streamly/issues/1307 seems related, but it was a long time ago. We weren't heavy users anyway, so our streaming philosophy is now "conduit if it's simple and plugging into a conduit-using library, streaming if you're doing complicated things".

pytm

1 836 6.9 Python

A Pythonic framework for threat modeling
NIPY

0 731 8.9 Python

Workflows and interfaces for neuroimaging packages
finn

4 661 0.0 Python

Dataflow compiler for QNN inference on FPGAs

Project mention: Hi, What could be the best HLS tool for implementing neural networks on FPGA | /r/FPGA | 2023-06-13

FINN - https://github.com/Xilinx/finn

relic

13 392 3.6 Clojure

Functional relational programming for Clojure(Script).

Project mention: FoundationDB: A Distributed Key-Value Store | news.ycombinator.com | 2023-07-03

I've been tooling around with "Tuple Database", which claims to be FoundationDB for the frontend (by the original dev of Notion).
https://github.com/ccorcos/tuple-database/
I have found it conceptually similar to Relic or Datascript, but with strong preformance guarantees - something Relic considers a potential issue. It also solves the problem of using reactive queries to trigger things like popups and fullscreen requests, which must be run in the same event loop as user input.
https://github.com/wotbrew/relic

PothosCore

1 300 0.6 C++

The Pothos data-flow framework
flowbase

3 161 0.0 Go

A Flow-based Programming inspired micro-framework / un-framework for Go (Golang)
blocks

3 151 4.9 JavaScript

Blocks. An online drag-and-drop smart contract builder. (by Blocks-Editor)
ObservableComputations

3 108 0.0 C#

Cross-platform .NET library for computations whose arguments and results are objects that implement INotifyPropertyChanged and INotifyCollectionChanged (ObservableCollection) interfaces.
entangle

8 105 0.0 Python

A lightweight (serverless) native python parallel processing framework based on simple decorators and call graphs.
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Dataflow related posts

Dora: Low latency, composable, and distributed dataflow for AI and robotic
1 project | news.ycombinator.com | 21 Mar 2024
An IDE plugin for Rust that helps you focus on relevant code
1 project | news.ycombinator.com | 14 Dec 2023
Flowistry: an IDE plugin that analyzes the information flow of Rust programs, showing whether it's possible for one piece of code to affect another
1 project | /r/rust | 10 Dec 2023
Hi, What could be the best HLS tool for implementing neural networks on FPGA
2 projects | /r/FPGA | 13 Jun 2023
Use of Posh for frontend development?
9 projects | /r/Clojure | 9 May 2023
Any data flow visualization tools?
2 projects | /r/rust | 28 Apr 2023
Can anyone tell if Xilinx's FINN (from Xilinx's research lab) is restricted for use only to xilinx based FPGAs?
2 projects | /r/FPGA | 8 Apr 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 25 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Dataflow projects? This list will help you:

	Project	Stars
1	Drawflow	4,126
2	umbrella	3,205
3	nextflow	2,538
4	Scio	2,520
5	pyt	2,161
6	flowistry	1,815
7	bearer	1,736
8	baklavajs	1,360
9	bytewax	1,144
10	ipyflow	1,073
11	scipipe	1,054
12	dora	998
13	RaftLib	923
14	streamly	847
15	pytm	836
16	NIPY	731
17	finn	661
18	relic	392
19	PothosCore	300
20	flowbase	161
21	blocks	151
22	ObservableComputations	108
23	entangle	105