Show HN: Hamilton, a Microframework for Creating Dataframes

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • hamilton

    Discontinued A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton (by stitchfix)

  • Dask

    Parallel computing with task scheduling

  • This project reminds me a lot of Dask https://dask.org/. A library that allows delayed calculation of complex dataframes in Python.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • pynto

    Time series analysis in Python using the concatenative paradigm

  • My pynto https://github.com/punkbrwstr/pynto is a similar framework for creating dataframes, but using a concatenative paradigm that treats the frame as a stack of columns. Functions ("words") operate on the stack to set up the graph for each column, and execution happens afterwards in parallel. Instead of function modifiers like @does it uses combinators to apply quoted operations to multiple columns. The postfix syntax (think postscript or factor) is unambiguous, if a bit old-school.

  • plumbing

    Prismatic's Clojure(Script) utility belt

  • This reminds me a bit of a Clojure library called Plumbing (formerly Graph): https://github.com/plumatic/plumbing. It also let you create a DAG for structured computation. It was used for a web service, at that time.

  • tributary

    Streaming reactive and dataflow graphs in Python

  • Having worked on "Dagger", you may be interested in https://github.com/timkpaine/tributary

  • prosto

    Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

  • Hamilton is more similar to the Prosto data processing toolkit which also relies on column operations defined via Python functions:

    https://github.com/asavinov/prosto

    However, Prosto allows for data processing via column operations in many tables (implemented as pandas data frames) by providing a column-oriented equivalents for joins and groupby (hence it has no joins and no groupbys which are known to be quite difficult and require high expertise).

    Prosto also provides Column-SQL which might be simpler and more natural in many use cases.

    The whole approach is based on the concept-oriented model of data which makes functions first-class elements of the model as opposed to having only sets in the relational model.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Read files from s3 using Pandas/s3fs or AWS Data Wrangler?

    3 projects | /r/dataengineering | 6 Dec 2023
  • The Distributed Tensor Algebra Compiler (2022)

    4 projects | news.ycombinator.com | 15 Jun 2023
  • Why are physics undergrads told to "learn programming" and what does this consist of?

    2 projects | /r/PhysicsStudents | 19 May 2023
  • We are the developers behind pandas, currently preparing for the 2.0 release :) AMA

    9 projects | /r/Python | 1 Mar 2023
  • What are the best Python libraries to learn for beginners?

    4 projects | /r/Python | 31 Jan 2023