Show HN: Hamilton, a Microframework for Creating Dataframes

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

hamilton

26 878 8.1 Python

Discontinued A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton (by stitchfix)
Dask

32 12,022 9.6 Python

Parallel computing with task scheduling

This project reminds me a lot of Dask https://dask.org/. A library that allows delayed calculation of complex dataframes in Python.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
pynto

1 6 6.1 Python

Time series analysis in Python using the concatenative paradigm

My pynto https://github.com/punkbrwstr/pynto is a similar framework for creating dataframes, but using a concatenative paradigm that treats the frame as a stack of columns. Functions ("words") operate on the stack to set up the graph for each column, and execution happens afterwards in parallel. Instead of function modifiers like @does it uses combinators to apply quoted operations to multiple columns. The postfix syntax (think postscript or factor) is unambiguous, if a bit old-school.

plumbing

2 1,483 0.0 Clojure

Prismatic's Clojure(Script) utility belt

This reminds me a bit of a Clojure library called Plumbing (formerly Graph): https://github.com/plumatic/plumbing. It also let you create a DAG for structured computation. It was used for a web service, at that time.

tributary

3 24 6.7 Python

Streaming reactive and dataflow graphs in Python

Having worked on "Dagger", you may be interested in https://github.com/timkpaine/tributary

prosto

9 89 3.6 Python

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

Hamilton is more similar to the Prosto data processing toolkit which also relies on column operations defined via Python functions:
https://github.com/asavinov/prosto
However, Prosto allows for data processing via column operations in many tables (implemented as pandas data frames) by providing a column-oriented equivalents for joins and groupby (hence it has no joins and no groupbys which are known to be quite difficult and require high expertise).
Prosto also provides Column-SQL which might be simpler and more natural in many use cases.
The whole approach is based on the concept-oriented model of data which makes functions first-class elements of the model as opposed to having only sets in the relational model.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Read files from s3 using Pandas/s3fs or AWS Data Wrangler?

3 projects | /r/dataengineering | 6 Dec 2023
The Distributed Tensor Algebra Compiler (2022)

4 projects | news.ycombinator.com | 15 Jun 2023
Why are physics undergrads told to "learn programming" and what does this consist of?

2 projects | /r/PhysicsStudents | 19 May 2023
We are the developers behind pandas, currently preparing for the 2.0 release :) AMA

9 projects | /r/Python | 1 Mar 2023
What are the best Python libraries to learn for beginners?

4 projects | /r/Python | 31 Jan 2023

Show HN: Hamilton, a Microframework for Creating Dataframes

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Python Pandas Science and Data analysis Workflow Data processing
Post date: 8 Nov 2021

hamilton

Dask

InfluxDB

pynto

plumbing

tributary

prosto

Related posts

Read files from s3 using Pandas/s3fs or AWS Data Wrangler?

The Distributed Tensor Algebra Compiler (2022)

Why are physics undergrads told to "learn programming" and what does this consist of?

We are the developers behind pandas, currently preparing for the 2.0 release :) AMA

What are the best Python libraries to learn for beginners?

Show HN: Hamilton, a Microframework for Creating Dataframes

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Python Pandas Science and Data analysis Workflow Data processing Post date: 8 Nov 2021

hamilton

Dask

InfluxDB

pynto

plumbing

tributary

prosto

Related posts

Read files from s3 using Pandas/s3fs or AWS Data Wrangler?

The Distributed Tensor Algebra Compiler (2022)

Why are physics undergrads told to "learn programming" and what does this consist of?

We are the developers behind pandas, currently preparing for the 2.0 release :) AMA

What are the best Python libraries to learn for beginners?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Python Pandas Science and Data analysis Workflow Data processing
Post date: 8 Nov 2021