Modern Pandas (Part 2): Method Chaining

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

data_algebra

5 113 8.5 Python

Codd method-chained SQL generator and Pandas data processing in Python.

There are a number of packages in Python specializing in variations of piped processing in Pandas. My own is this one: https://github.com/WinVector/data_algebra .

polars

144 26,378 10.0 Rust

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

I'd recommend checking out polars as an alternative to pandas - https://github.com/pola-rs/polars
It has a rather different api, and is significantly faster. Highly recommend it.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
dataiter

2 25 7.8 Python

Python classes for data manipulation

Here's another alternative. I wrote Dataiter specifically as I too was frustrated with Pandas. In my experience if you design a new API from scratch (and don't try to reimplement the Pandas API as many projects have done!) and have some vision and consistent principles, it's well possible to get a good intuitive API as a result. Two relevant issues remain: You're limited by NumPy's datatypes and their problems, such as memory-hogging strings and a lack of a proper missing value (NA), and secondly, limited by the Python language, so compared to e.g. dplyr's non-standard evaluation, you'll need to use lambda functions, which are unfortunately clumsy and verbose.
https://github.com/otsaloma/dataiter
Here's a comparison of dplyr vs. Dataiter vs. Pandas, which should give quick overview of the similarieties and differences.
https://dataiter.readthedocs.io/en/latest/_static/comparison...

mito

18 2,223 10.0 Python

The mitosheet package, trymito.io, and other public Mito code.

My team has been trying to modernize pandas from a different tact. Regardless of struggle with the syntax, it seems Pandas is very sticky, and we don't predict much migration to other data science languages. Instead of refining the syntax, we have combined it with a spreadsheet GUI (https://github.com/mito-ds/monorepo). Here, we worry less about writing perfect syntax ourselves, and let the GUI write the code for functions like pivot tables and merges that work well visually.

chain-ops-python

2 0 10.0

Simple chaining of operations (a.k.a. pipe operator) in python

You don't need pandas to do chaining. It's a one-liner in pure python: https://github.com/tpapastylianou/chain-ops-python
Not to mention, it's a lot more debuggable this way (which is generally the biggest downside to most specialised chaining approaches).

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

The Design Philosophy of Great Tables (Software Package)

7 projects | news.ycombinator.com | 4 Apr 2024
Welcome to 14 days of Data Science!

1 project | dev.to | 7 Mar 2024
Read files from s3 using Pandas/s3fs or AWS Data Wrangler?

3 projects | /r/dataengineering | 6 Dec 2023
Data Science for Beginners - A Curriculum

1 project | /r/programming | 8 Sep 2023
Pandas AI – The Future of Data Analysis

7 projects | news.ycombinator.com | 17 May 2023

Modern Pandas (Part 2): Method Chaining

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Python Data Science Data Analysis dataframe-library Pandas
Post date: 1 May 2022

data_algebra

polars

InfluxDB

dataiter

mito

chain-ops-python

SaaSHub

Related posts

The Design Philosophy of Great Tables (Software Package)

Welcome to 14 days of Data Science!

Read files from s3 using Pandas/s3fs or AWS Data Wrangler?

Data Science for Beginners - A Curriculum

Pandas AI – The Future of Data Analysis

Modern Pandas (Part 2): Method Chaining

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Python Data Science Data Analysis dataframe-library Pandas Post date: 1 May 2022

data_algebra

polars

InfluxDB

dataiter

mito

chain-ops-python

SaaSHub

Related posts

The Design Philosophy of Great Tables (Software Package)

Welcome to 14 days of Data Science!

Read files from s3 using Pandas/s3fs or AWS Data Wrangler?

Data Science for Beginners - A Curriculum

Pandas AI – The Future of Data Analysis

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Python Data Science Data Analysis dataframe-library Pandas
Post date: 1 May 2022