Siuba – A Dplyr Port to Python

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • siuba

    Python library for using dplyr like syntax with pandas and SQL

  • Hey, thanks for pointing out Self--I definitely need to dig into fastcore more!

    One motivation for developing siuba is that the grouped agg you show requires users specify only one operation on one column.

    E.g.

    1. Calculate mean of x

    However, common operations like demeaning a column are multiple operations:

    1. Calculate mean of x

    2. Subtract result of (1) from x

    In siuba you can just write mutate(res = _.x -_.x.mean()). This isn't possible from something like gdf.x.agg("mean"), and from what I can tell deeply confusing to analysts :/.

    In vanilla pandas I really like to use the chaining method you laid out, and siuba to me is mostly a utility library for making the approach a little more succinct / performant[1].

    siuba has experimental autocompletion (thanks to Tim Mastny!), and there's a pretty hefty technical write up on how it uses IPython machinery for that in siuba's architectural desicion record folder[2].

    [1]: https://siuba.readthedocs.io/en/latest/developer/pandas-grou...

    [2]: https://github.com/machow/siuba/blob/master/examples/archite...

  • data_algebra

    Codd method-chained SQL generator and Pandas data processing in Python.

  • Neat. I've been working on my own "piped-Codd" style system I call the "data algebra" https://github.com/WinVector/data_algebra

    I use method chaining as the composing notation.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts