Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Hey, thanks for pointing out Self--I definitely need to dig into fastcore more!
One motivation for developing siuba is that the grouped agg you show requires users specify only one operation on one column.
E.g.
1. Calculate mean of x
However, common operations like demeaning a column are multiple operations:
1. Calculate mean of x
2. Subtract result of (1) from x
In siuba you can just write mutate(res = _.x -_.x.mean()). This isn't possible from something like gdf.x.agg("mean"), and from what I can tell deeply confusing to analysts :/.
In vanilla pandas I really like to use the chaining method you laid out, and siuba to me is mostly a utility library for making the approach a little more succinct / performant[1].
siuba has experimental autocompletion (thanks to Tim Mastny!), and there's a pretty hefty technical write up on how it uses IPython machinery for that in siuba's architectural desicion record folder[2].
[1]: https://siuba.readthedocs.io/en/latest/developer/pandas-grou...
[2]: https://github.com/machow/siuba/blob/master/examples/archite...
Neat. I've been working on my own "piped-Codd" style system I call the "data algebra" https://github.com/WinVector/data_algebra
I use method chaining as the composing notation.