-
hamilton
Discontinued A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton (by stitchfix)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
prosto
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
This project reminds me a lot of Dask https://dask.org/. A library that allows delayed calculation of complex dataframes in Python.
My pynto https://github.com/punkbrwstr/pynto is a similar framework for creating dataframes, but using a concatenative paradigm that treats the frame as a stack of columns. Functions ("words") operate on the stack to set up the graph for each column, and execution happens afterwards in parallel. Instead of function modifiers like @does it uses combinators to apply quoted operations to multiple columns. The postfix syntax (think postscript or factor) is unambiguous, if a bit old-school.
This reminds me a bit of a Clojure library called Plumbing (formerly Graph): https://github.com/plumatic/plumbing. It also let you create a DAG for structured computation. It was used for a web service, at that time.
Having worked on "Dagger", you may be interested in https://github.com/timkpaine/tributary
Hamilton is more similar to the Prosto data processing toolkit which also relies on column operations defined via Python functions:
https://github.com/asavinov/prosto
However, Prosto allows for data processing via column operations in many tables (implemented as pandas data frames) by providing a column-oriented equivalents for joins and groupby (hence it has no joins and no groupbys which are known to be quite difficult and require high expertise).
Prosto also provides Column-SQL which might be simpler and more natural in many use cases.
The whole approach is based on the concept-oriented model of data which makes functions first-class elements of the model as opposed to having only sets in the relational model.
Related posts
-
Read files from s3 using Pandas/s3fs or AWS Data Wrangler?
-
The Distributed Tensor Algebra Compiler (2022)
-
Why are physics undergrads told to "learn programming" and what does this consist of?
-
We are the developers behind pandas, currently preparing for the 2.0 release :) AMA
-
What are the best Python libraries to learn for beginners?