Modern Pandas (Part 2): Method Chaining

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • data_algebra

    Codd method-chained SQL generator and Pandas data processing in Python.

  • There are a number of packages in Python specializing in variations of piped processing in Pandas. My own is this one: https://github.com/WinVector/data_algebra .

  • polars

    Dataframes powered by a multithreaded, vectorized query engine, written in Rust

  • I'd recommend checking out polars as an alternative to pandas - https://github.com/pola-rs/polars

    It has a rather different api, and is significantly faster. Highly recommend it.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • dataiter

    Python classes for data manipulation

  • Here's another alternative. I wrote Dataiter specifically as I too was frustrated with Pandas. In my experience if you design a new API from scratch (and don't try to reimplement the Pandas API as many projects have done!) and have some vision and consistent principles, it's well possible to get a good intuitive API as a result. Two relevant issues remain: You're limited by NumPy's datatypes and their problems, such as memory-hogging strings and a lack of a proper missing value (NA), and secondly, limited by the Python language, so compared to e.g. dplyr's non-standard evaluation, you'll need to use lambda functions, which are unfortunately clumsy and verbose.

    https://github.com/otsaloma/dataiter

    Here's a comparison of dplyr vs. Dataiter vs. Pandas, which should give quick overview of the similarieties and differences.

    https://dataiter.readthedocs.io/en/latest/_static/comparison...

  • mito

    The mitosheet package, trymito.io, and other public Mito code.

  • My team has been trying to modernize pandas from a different tact. Regardless of struggle with the syntax, it seems Pandas is very sticky, and we don't predict much migration to other data science languages. Instead of refining the syntax, we have combined it with a spreadsheet GUI (https://github.com/mito-ds/monorepo). Here, we worry less about writing perfect syntax ourselves, and let the GUI write the code for functions like pivot tables and merges that work well visually.

  • chain-ops-python

    Simple chaining of operations (a.k.a. pipe operator) in python

  • You don't need pandas to do chaining. It's a one-liner in pure python: https://github.com/tpapastylianou/chain-ops-python

    Not to mention, it's a lot more debuggable this way (which is generally the biggest downside to most specialised chaining approaches).

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • The Design Philosophy of Great Tables (Software Package)

    7 projects | news.ycombinator.com | 4 Apr 2024
  • Welcome to 14 days of Data Science!

    1 project | dev.to | 7 Mar 2024
  • Read files from s3 using Pandas/s3fs or AWS Data Wrangler?

    3 projects | /r/dataengineering | 6 Dec 2023
  • Data Science for Beginners - A Curriculum

    1 project | /r/programming | 8 Sep 2023
  • Pandas AI – The Future of Data Analysis

    7 projects | news.ycombinator.com | 17 May 2023