Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Dataflow Open-Source Projects
-
umbrella
⛱ Broadly scoped ecosystem & mono-repository of 190 TypeScript projects (and 155 examples) for general purpose, functional, data driven development
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
bearer
Code security scanning tool (SAST) to discover, filter and prioritize security and privacy risks.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
ObservableComputations
Cross-platform .NET library for computations whose arguments and results are objects that implement INotifyPropertyChanged and INotifyCollectionChanged (ObservableCollection) interfaces.
-
entangle
A lightweight (serverless) native python parallel processing framework based on simple decorators and call graphs.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
❌📄Drawflow - Seems nice, but no docs, and last commit was a year ago
> It's been a while since you can rerun/resume Nextflow pipelines
Yes, you can resume, but you need your whole upstream DAG to be present. Snakemake can rerun a job when only the dependencies of that job are present, which allows to neatly manage the disk usage, or archive an intermediate state of a project and rerun things from there.
> and yes, you can have dry runs in Nextflow
You have stubs, which really isn't the same thing.
> I have no idea what you're referring to with the 'arbitrary limit of 1000 parallel jobs' though
I was referring to this issue: https://github.com/nextflow-io/nextflow/issues/1871. Except, the discussion doesn't give the issue a full justice. Nextflow spans each job in a separate thread, and when it tries to span 1000+ condor jobs it die with a cryptic error message. The option of -Dnxf.pool.type=sync and -Dnxf.pool.maxThreads=N prevents the ability to resume and attempts to rerun the pipeline.
> As for deleting temporary files, there are features that allow you to do a few things related to that, and other features being implemented.
There are some hacks for this - but nothing I would feel safe to integrate into a production tool. They are implementing something - you're right - and it's been the case for several years now, so we'll see.
Snakemake has all that out of the box.
Project mention: An IDE plugin for Rust that helps you focus on relevant code | news.ycombinator.com | 2023-12-14
Project mention: Show HN: Bearer Code Security Scanner Add Support for Java, PHP, Go, and Python | news.ycombinator.com | 2023-10-26
Project mention: Building a streaming SQL engine with Arrow and DataFusion | news.ycombinator.com | 2024-03-18
Project mention: Show HN: Marimo – an open-source reactive notebook for Python | news.ycombinator.com | 2024-01-12You're probably referring to nbgather (https://github.com/microsoft/gather), which shipped with VSCode for a while.
nbgather used static slicing to get all the code necessary to reconstruct some cell. I actually worked with Andrew Head (original nbgather author) and Shreya Shankar to implement something similar in ipyflow (but with dynamic slicing and a not-as-nice interface): https://github.com/ipyflow/ipyflow?tab=readme-ov-file#state-...
I have no doubt something like this will make its way into marimo's roadmap at some point :)
Project mention: Dora: Low latency, composable, and distributed dataflow for AI and robotic | news.ycombinator.com | 2024-03-21
https://github.com/composewell/streamly/issues/1307 seems related, but it was a long time ago. We weren't heavy users anyway, so our streaming philosophy is now "conduit if it's simple and plugging into a conduit-using library, streaming if you're doing complicated things".
Project mention: Hi, What could be the best HLS tool for implementing neural networks on FPGA | /r/FPGA | 2023-06-13FINN - https://github.com/Xilinx/finn
I've been tooling around with "Tuple Database", which claims to be FoundationDB for the frontend (by the original dev of Notion).
https://github.com/ccorcos/tuple-database/
I have found it conceptually similar to Relic or Datascript, but with strong preformance guarantees - something Relic considers a potential issue. It also solves the problem of using reactive queries to trigger things like popups and fullscreen requests, which must be run in the same event loop as user input.
https://github.com/wotbrew/relic
Dataflow related posts
- Dora: Low latency, composable, and distributed dataflow for AI and robotic
- An IDE plugin for Rust that helps you focus on relevant code
- Flowistry: an IDE plugin that analyzes the information flow of Rust programs, showing whether it's possible for one piece of code to affect another
- Hi, What could be the best HLS tool for implementing neural networks on FPGA
- Use of Posh for frontend development?
- Any data flow visualization tools?
- Can anyone tell if Xilinx's FINN (from Xilinx's research lab) is restricted for use only to xilinx based FPGAs?
-
A note from our sponsor - InfluxDB
www.influxdata.com | 25 Apr 2024
Index
What are some of the best open-source Dataflow projects? This list will help you:
Project | Stars | |
---|---|---|
1 | Drawflow | 4,126 |
2 | umbrella | 3,205 |
3 | nextflow | 2,538 |
4 | Scio | 2,520 |
5 | pyt | 2,161 |
6 | flowistry | 1,815 |
7 | bearer | 1,736 |
8 | baklavajs | 1,360 |
9 | bytewax | 1,144 |
10 | ipyflow | 1,073 |
11 | scipipe | 1,054 |
12 | dora | 998 |
13 | RaftLib | 923 |
14 | streamly | 847 |
15 | pytm | 836 |
16 | NIPY | 731 |
17 | finn | 661 |
18 | relic | 392 |
19 | PothosCore | 300 |
20 | flowbase | 161 |
21 | blocks | 151 |
22 | ObservableComputations | 108 |
23 | entangle | 105 |
Sponsored