Dataflow

Top 23 Dataflow Open-Source Projects

  • Drawflow

    Simple flow library 🖥️🖱️

  • Project mention: How to make beautiful flowchart with Angular ? | /r/Frontend | 2023-07-27

    ❌📄Drawflow - Seems nice, but no docs, and last commit was a year ago

  • umbrella

    ⛱ Broadly scoped ecosystem & mono-repository of 190 TypeScript projects (and 155 examples) for general purpose, functional, data driven development

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • nextflow

    A DSL for data-driven computational pipelines

  • Project mention: Nextflow: Data-Driven Computational Pipelines | news.ycombinator.com | 2023-08-10

    > It's been a while since you can rerun/resume Nextflow pipelines

    Yes, you can resume, but you need your whole upstream DAG to be present. Snakemake can rerun a job when only the dependencies of that job are present, which allows to neatly manage the disk usage, or archive an intermediate state of a project and rerun things from there.

    > and yes, you can have dry runs in Nextflow

    You have stubs, which really isn't the same thing.

    > I have no idea what you're referring to with the 'arbitrary limit of 1000 parallel jobs' though

    I was referring to this issue: https://github.com/nextflow-io/nextflow/issues/1871. Except, the discussion doesn't give the issue a full justice. Nextflow spans each job in a separate thread, and when it tries to span 1000+ condor jobs it die with a cryptic error message. The option of -Dnxf.pool.type=sync and -Dnxf.pool.maxThreads=N prevents the ability to resume and attempts to rerun the pipeline.

    > As for deleting temporary files, there are features that allow you to do a few things related to that, and other features being implemented.

    There are some hacks for this - but nothing I would feel safe to integrate into a production tool. They are implementing something - you're right - and it's been the case for several years now, so we'll see.

    Snakemake has all that out of the box.

  • Scio

    A Scala API for Apache Beam and Google Cloud Dataflow.

  • pyt

    A Static Analysis Tool for Detecting Security Vulnerabilities in Python Web Applications

  • flowistry

    Flowistry is an IDE plugin for Rust that helps you focus on relevant code.

  • Project mention: An IDE plugin for Rust that helps you focus on relevant code | news.ycombinator.com | 2023-12-14
  • bearer

    Code security scanning tool (SAST) to discover, filter and prioritize security and privacy risks.

  • Project mention: Show HN: Bearer Code Security Scanner Add Support for Java, PHP, Go, and Python | news.ycombinator.com | 2023-10-26
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • baklavajs

    Graph / node editor in the browser using VueJS

  • bytewax

    Python Stream Processing

  • Project mention: Building a streaming SQL engine with Arrow and DataFusion | news.ycombinator.com | 2024-03-18
  • ipyflow

    A reactive Python kernel for Jupyter notebooks.

  • Project mention: Show HN: Marimo – an open-source reactive notebook for Python | news.ycombinator.com | 2024-01-12

    You're probably referring to nbgather (https://github.com/microsoft/gather), which shipped with VSCode for a while.

    nbgather used static slicing to get all the code necessary to reconstruct some cell. I actually worked with Andrew Head (original nbgather author) and Shreya Shankar to implement something similar in ipyflow (but with dynamic slicing and a not-as-nice interface): https://github.com/ipyflow/ipyflow?tab=readme-ov-file#state-...

    I have no doubt something like this will make its way into marimo's roadmap at some point :)

  • scipipe

    Robust, flexible and resource-efficient pipelines using Go and the commandline

  • dora

    low latency, composable, and distributed dataflow for AI and robotic application (by dora-rs)

  • Project mention: Dora: Low latency, composable, and distributed dataflow for AI and robotic | news.ycombinator.com | 2024-03-21
  • RaftLib

    The RaftLib C++ library, streaming/dataflow concurrency via C++ iostream-like operators

  • streamly

    High performance, concurrent functional programming abstractions

  • Project mention: [ANN] Haskell Streamly 0.9.0 Release! | /r/haskell | 2023-05-25

    https://github.com/composewell/streamly/issues/1307 seems related, but it was a long time ago. We weren't heavy users anyway, so our streaming philosophy is now "conduit if it's simple and plugging into a conduit-using library, streaming if you're doing complicated things".

  • pytm

    A Pythonic framework for threat modeling

  • NIPY

    Workflows and interfaces for neuroimaging packages

  • finn

    Dataflow compiler for QNN inference on FPGAs

  • Project mention: Hi, What could be the best HLS tool for implementing neural networks on FPGA | /r/FPGA | 2023-06-13

    FINN - https://github.com/Xilinx/finn

  • relic

    Functional relational programming for Clojure(Script).

  • Project mention: FoundationDB: A Distributed Key-Value Store | news.ycombinator.com | 2023-07-03

    I've been tooling around with "Tuple Database", which claims to be FoundationDB for the frontend (by the original dev of Notion).

    https://github.com/ccorcos/tuple-database/

    I have found it conceptually similar to Relic or Datascript, but with strong preformance guarantees - something Relic considers a potential issue. It also solves the problem of using reactive queries to trigger things like popups and fullscreen requests, which must be run in the same event loop as user input.

    https://github.com/wotbrew/relic

  • PothosCore

    The Pothos data-flow framework

  • flowbase

    A Flow-based Programming inspired micro-framework / un-framework for Go (Golang)

  • blocks

    Blocks. An online drag-and-drop smart contract builder. (by Blocks-Editor)

  • ObservableComputations

    Cross-platform .NET library for computations whose arguments and results are objects that implement INotifyPropertyChanged and INotifyCollectionChanged (ObservableCollection) interfaces.

  • entangle

    A lightweight (serverless) native python parallel processing framework based on simple decorators and call graphs.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Dataflow related posts

Index

What are some of the best open-source Dataflow projects? This list will help you:

Project Stars
1 Drawflow 4,126
2 umbrella 3,205
3 nextflow 2,538
4 Scio 2,520
5 pyt 2,161
6 flowistry 1,815
7 bearer 1,736
8 baklavajs 1,360
9 bytewax 1,144
10 ipyflow 1,073
11 scipipe 1,054
12 dora 998
13 RaftLib 923
14 streamly 847
15 pytm 836
16 NIPY 731
17 finn 661
18 relic 392
19 PothosCore 300
20 flowbase 161
21 blocks 151
22 ObservableComputations 108
23 entangle 105

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com