flow vs streaming-consistency

flow

Computational parallel flows on top of GenStage (by dashbitco)

Algorithms and Data structures

Source Code

hexdocs.pm

Suggest alternative

Edit details

streaming-consistency

Demonstrations of (in)consistency in various streaming systems. (by jamii)

Suggest topics

DISCONTINUED

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

flow		streaming-consistency
	Project
2	Mentions	3
1,479	Stars	19
0.5%	Growth	-
3.4	Activity	1.8
10 months ago	Latest Commit	about 3 years ago
Elixir	Language	Java
Apache License 2.0	License	-

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

flow

Posts with mentions or reviews of flow. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-11-09.

Switching to Elixir
11 projects | news.ycombinator.com | 9 Nov 2023

You can actually have "background jobs" in very different ways in Elixir.
> I want background work to live on different compute capacity than http requests, both because they have very different resources usage
In Elixir, because of the way the BEAM works (the unit of parallelism is much cheaper and consume a low amount of memory), "incoming http requests" and related "workers" are not as expensive (a lot less actually) compared to other stacks (for instance Ruby and Python), where it is quite critical to release "http workers" and not hold the connection (which is what lead to the creation of background job tools like Resque, DelayedJob, Sidekiq, Celery...).
This means that you can actually hold incoming HTTP connections a lot longer without troubles.
A consequence of this is that implementing "reverse proxies", or anything calling third party servers _right in the middle_ of your own HTTP call, is usually perfectly acceptable (something I've done more than a couple of times, the latest one powering the reverse proxy behind https://transport.data.gouv.fr - code available at https://github.com/etalab/transport-site/tree/master/apps/un...).
As a consequence, what would be a bad pattern in Python or Ruby (holding the incoming HTTP connection) is not a problem with Elixir.
> because I want to have state or queues in front of background work so there's a well-defined process for retry, error handling, and back-pressure.
Unless you deal with immediate stuff like reverse proxying or cheap "one off async tasks" (like recording a metric), there also are solutions to have more "stateful" background works in Elixir, too.
A popular background job queue is https://github.com/sorentwo/oban (roughly similar to Sidekiq at al), which uses Postgres.
It handles retries, errors etc.
But it's not the only solution, as you have other tools dedicated to processing, such as Broadway (https://github.com/dashbitco/broadway), which handles back-pressure, fault-tolerance, batching etc natively.
You also have more simple options, such as flow (https://github.com/dashbitco/flow), gen_stage (https://github.com/elixir-lang/gen_stage), Task.async_stream (https://hexdocs.pm/elixir/1.12/Task.html#async_stream/5) etc.
It allows to use the "right tool for the job" quite easily.
It is also interesting to note there is no need to "go evented" if you need to fetch data from multiple HTTP servers: it can happen in the exact same process (even: in a background task attached to your HTTP server), as done here https://transport.data.gouv.fr/explore (if you zoom you will see vehicle moving in realtime, and ~80 data sources are being polled every 10 seconds & broadcasted to the visitors via pubsub & websockets).
An opinionated map of incremental and streaming systems (2018)
4 projects | news.ycombinator.com | 4 May 2021

Elixir has a few interesting abstractions for that: GenStage, Flow, Broadway.
https://github.com/dashbitco/flow

streaming-consistency

Posts with mentions or reviews of streaming-consistency. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-05-04.

The Query Your Database Can’t Answer
1 project | news.ycombinator.com | 4 Jun 2021

Anyone thinking about using Confluent as some kind of alternative to a database should read this blog post outlining the myriad correctness problems with ksqlDB: https://scattered-thoughts.net/writing/internal-consistency-...
An opinionated map of incremental and streaming systems (2018)
4 projects | news.ycombinator.com | 4 May 2021

Spark structured streaming is in there under structured, high temporal locality.
It didn't make it into https://scattered-thoughts.net/writing/internal-consistency-... because it has severe limitations for low temporal locality operations:
> * As of Spark 2.4, you can use joins only when the query is in Append output mode. Other output modes are not yet supported.
Internal Consistency in Streaming Systems
2 projects | news.ycombinator.com | 18 Apr 2021

> And then try to join credits and debits together by updating_tx.
You can't join on updating_tx because the credits and debits per account are disjoint sets of transactions - that join will never produce output.
I did try something similar with timestamps - https://github.com/jamii/streaming-consistency/blob/main/fli.... This is also wrong (because the timestamps don't have to match between credits and debits) but it at least produces output. It had a very similar error distribution to the original.

What are some alternatives?

When comparing flow and streaming-consistency you can also consider the following projects:

parallel_stream - A parallelized stream implementation for Elixir

lasp - Prototype implementation of Lasp in Erlang.

MapDiff - Calculates the difference between two (nested) maps, and returns a map representing the patch of changes.

Pravega - Pravega - Streaming as a new software defined storage primitive

fsm - Finite State Machine data structure

differential-datalog - DDlog is a programming language for incremental computation. It is well suited for writing programs that continuously update their output in response to input changes. A DDlog programmer does not write incremental algorithms; instead they specify the desired input-output mapping in a declarative manner.

graphmath - An Elixir library for performing 2D and 3D mathematics.

witchcraft - Monads and other dark magic for Elixir

matrex - A blazing fast matrix library for Elixir/Erlang with C implementation using CBLAS.

erlang-algorithms - Implementations of popular data structures and algorithms

qex - Queue data structure for Elixir-lang

fuse - A Circuit Breaker for Erlang