Show HN: Hamilton, a Microframework for Creating Dataframes

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. hamilton

    Discontinued A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton (by stitchfix)

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. Dask

    Parallel computing with task scheduling

    This project reminds me a lot of Dask https://dask.org/. A library that allows delayed calculation of complex dataframes in Python.

  4. pynto

    Time series analysis in Python using the concatenative paradigm

    My pynto https://github.com/punkbrwstr/pynto is a similar framework for creating dataframes, but using a concatenative paradigm that treats the frame as a stack of columns. Functions ("words") operate on the stack to set up the graph for each column, and execution happens afterwards in parallel. Instead of function modifiers like @does it uses combinators to apply quoted operations to multiple columns. The postfix syntax (think postscript or factor) is unambiguous, if a bit old-school.

  5. plumbing

    Prismatic's Clojure(Script) utility belt

    This reminds me a bit of a Clojure library called Plumbing (formerly Graph): https://github.com/plumatic/plumbing. It also let you create a DAG for structured computation. It was used for a web service, at that time.

  6. tributary

    Streaming reactive and dataflow graphs in Python

    Having worked on "Dagger", you may be interested in https://github.com/timkpaine/tributary

  7. prosto

    Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

    Hamilton is more similar to the Prosto data processing toolkit which also relies on column operations defined via Python functions:

    https://github.com/asavinov/prosto

    However, Prosto allows for data processing via column operations in many tables (implemented as pandas data frames) by providing a column-oriented equivalents for joins and groupby (hence it has no joins and no groupbys which are known to be quite difficult and require high expertise).

    Prosto also provides Column-SQL which might be simpler and more natural in many use cases.

    The whole approach is based on the concept-oriented model of data which makes functions first-class elements of the model as opposed to having only sets in the relational model.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Data Visualisation Basics

    3 projects | dev.to | 6 Sep 2024
  • 7 Python Excel Libraries: In-Depth Review for Developers

    3 projects | dev.to | 18 Jul 2024
  • Read files from s3 using Pandas/s3fs or AWS Data Wrangler?

    3 projects | /r/dataengineering | 6 Dec 2023
  • The Distributed Tensor Algebra Compiler (2022)

    4 projects | news.ycombinator.com | 15 Jun 2023
  • Why are physics undergrads told to "learn programming" and what does this consist of?

    2 projects | /r/PhysicsStudents | 19 May 2023

Did you know that Python is
the 2nd most popular programming language
based on number of references?