Top 23 Dataflow Open-Source Projects

  • kestra

    Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring millions of complex pipelines.

    Project mention: Airflow's Problem | news.ycombinator.com | 2022-08-02

    But I totally agree that a large static dag is not appropriate in the actual data world with data mesh and domain responsibility.

    [0] https://github.com/kestra-io/kestra

  • Drawflow

    Simple flow library 🖥️🖱️

    Project mention: Improving drag'n'drop | reddit.com/r/vuejs | 2022-08-03

    I also found LeaderLine and a more complex one, the DrawFlow. The Drawflow one will hard, as the project is on prod and we didn't used canvas. We drag'n'drop on grid and flex lists.

  • SonarLint

    Clean code begins in your IDE with SonarLint. Up your coding game and discover issues early. SonarLint is a free plugin that helps you find & fix bugs and security issues from the moment you start writing code. Install from your favorite IDE marketplace today.

  • umbrella

    ⛱ Broadly scoped ecosystem & mono-repository of 168+ TypeScript projects for functional, data driven development

    Project mention: Image from my current generative art project "Harmonium" | reddit.com/r/generative | 2022-06-23

    OP here. This project is implemented in Typescript and Svelte, with support from the thi.ng libraries.

  • Scio

    A Scala API for Apache Beam and Google Cloud Dataflow.

    Project mention: For the DE's that choose Java over Python in new projects, why? | reddit.com/r/dataengineering | 2022-06-02

    I doubt it is possible because I suspect that GIL would like a word. So I could spend nights trying to make it work in Python (and possibly, if not likely, fail). Or I could just use this ready made solution.

  • pyt

    A Static Analysis Tool for Detecting Security Vulnerabilities in Python Web Applications

  • nextflow

    A DSL for data-driven computational pipelines

    Project mention: Nextflow vs Snakemake | reddit.com/r/bioinformatics | 2022-07-29

    We could spend the day pointing to things we wish were different, but that doesn't change the fact that Nextflow is the leader when it comes to workflow orchestration. And feel free to create a new issue in the GitHub repository if you wish to request a feature :)

  • flowistry

    Flowistry is an IDE plugin for Rust that helps you focus on relevant code.

    Project mention: flowistry plugin? | reddit.com/r/neovim | 2022-09-10
  • talent.io

    Download talent.io’s Tech Salary Report. Median salaries, most in-demand technologies, state of the remote work... all you need to know your worth on the market by tech recruitment platform talent.io

  • RaftLib

    The RaftLib C++ library, streaming/dataflow concurrency via C++ iostream-like operators

  • streamly

    Dataflow programming and declarative concurrency

    Project mention: Haskell Libraries I Love | reddit.com/r/haskell | 2022-05-30

    I want to like streamly, but the API is so huge, yet I feel like I'm doing things on a too low level of abstraction. (And as long as it needs a ghc plugin I doubt it'll become the de facto standard.) Though maybe I just haven't used it enough. It does have great docs at https://streamly.composewell.com/ and they seem to be taking both performance, dependency weight and API design quite seriously.

  • NIPY

    Workflows and interfaces for neuroimaging packages

  • pytm

    A Pythonic framework for threat modeling

    Project mention: Pytm | reddit.com/r/devopspro | 2022-03-09
  • baklavajs

    Graph / node editor in the browser using VueJS

  • hyperfiddle-2020

    CRUD apps as a function

    Project mention: [Research] Only 7% of web developers would use no-code/low-code tools to start web applications in 2022 | reddit.com/r/programming | 2022-06-16

    You might want to have a look at hyperfiddle

  • PothosCore

    The Pothos data-flow framework

  • flowbase

    A Flow-based Programming inspired micro-framework / un-framework for Go (Golang)

  • relic

    Functional relational programming for Clojure(Script).

    Project mention: ANN: relic - functional relational database and military grade anti-tar library. | reddit.com/r/Clojure | 2022-01-17

    I have recently cut the first alpha release of relic that I'm happy to share: https://github.com/wotbrew/relic.

  • entangle

    A lightweight (serverless) native python parallel processing framework based on simple decorators and call graphs.

  • ObservableComputations

    Cross-platform .NET library for computations whose arguments and results are objects that implement INotifyPropertyChanged and INotifyCollectionChanged (ObservableCollection) interfaces.

    Project mention: The only .NET open-source project at HighLoad++ conference | reddit.com/r/dotnet | 2022-02-21

    I am taking part in the contest of open-source projects. The prize is the opportunity to become a speaker at HighLoad++ conference in March this year in Moscow. The winner can present his open-source project. My project (ObservableComputations) is the only .NET at that contest. If you consider my project is best, please, vote for it. Otherwise, vote for another open-source project. Following is the link to vote (Facebook authorization is available):

  • flowgraph

    Flowgraph package for scalable asynchronous system development (by vectaport)

  • DFiant

    DFiant: A Dataflow Hardware Descripition Language

    Project mention: FCCM'22 Tutorial: Recent Developments in Hardware Description Languages | reddit.com/r/FPGA | 2022-04-05
  • flowsaber

    Dataflow based workflow framework

  • Tesseract

    A set of libraries for rapidly developing Pipeline driven micro/macroservices. (by houseofcat)

    Project mention: Best way to process large amount of Tasks? | reddit.com/r/csharp | 2022-08-19

    If you need more advanced stuff check out my Dataflows. https://github.com/houseofcat/tesseract

  • prefect-deployment-patterns

    Code examples showing flow deployment to various types of infrastructure

    Project mention: [D] Should I go with Prefect, Argo or Flyte for Model Training and ML workflow orchestration? | reddit.com/r/MachineLearning | 2022-09-26

    Have you used infrastructure blocks in Prefect? You could easily build a block for Sagemaker deploying infrastructure for the flow running with GPUs, then run other flow in a local process, yet another one as Kubernetes job, Docker container, ECS task, AWS batch, etc. Super easy to set up, even from the UI or from CI/CD. There are a bunch of templates and examples here: https://github.com/anna-geller/prefect-deployment-patterns

  • Scout APM

    Truly a developer’s best friend. Scout APM is great for developers who want to find and fix performance issues in their applications. With Scout, we'll take care of the bugs so you can focus on building great things 🚀.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-09-26.

Dataflow related posts

Index

What are some of the best open-source Dataflow projects? This list will help you:

Project Stars
1 kestra 2,710
2 Drawflow 2,606
3 umbrella 2,577
4 Scio 2,379
5 pyt 2,096
6 nextflow 1,838
7 flowistry 1,299
8 RaftLib 816
9 streamly 743
10 NIPY 648
11 pytm 602
12 baklavajs 584
13 hyperfiddle-2020 557
14 PothosCore 269
15 flowbase 146
16 relic 128
17 entangle 100
18 ObservableComputations 88
19 flowgraph 51
20 DFiant 44
21 flowsaber 37
22 Tesseract 31
23 prefect-deployment-patterns 27
Find remote jobs at our new job board 99remotejobs.com. There are 5 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
Truly a developer’s best friend
Scout APM is great for developers who want to find and fix performance issues in their applications. With Scout, we'll take care of the bugs so you can focus on building great things 🚀.
scoutapm.com