datar: the dplyr in python

This page summarizes the projects mentioned and recommended in the original post on /r/Rlanguage

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • datar

    A Grammar of Data Manipulation in python

  • datar does not only mimic the piping syntax, but follows the API design from dplyr as much as possible, and is tested with its test cases.

  • plydata

    A grammar for data manipulation in Python

  • from datar import f from datar.dplyr import mutate, filter, if_else from datar.tibble import tibble # or # from datar.all import f, mutate, filter, if_else, tibble df = tibble( x=range(4), y=['zero', 'one', 'two', 'three'] ) df >> mutate(z=f.x) """# output x y z 0 0 zero 0 1 1 one 1 2 2 two 2 3 3 three 3 """ df >> mutate(z=if_else(f.x>1, 1, 0)) """# output: x y z 0 0 zero 0 1 1 one 0 2 2 two 1 3 3 three 1 """ df >> filter(f.x>1) """# output: x y 0 2 two 1 3 three """ df >> mutate(z=if_else(f.x>1, 1, 0)) >> filter(f.z==1) """# output: x y z 0 2 two 1 1 3 three 1 """ Works with plotnine # example grabbed from https://github.com/has2k1/plydata import numpy from datar.base import sin, pi from plotnine import ggplot, aes, geom_line, theme_classic df = tibble(x=numpy.linspace(0, 2*pi, 500)) (df >> mutate(y=sin(f.x), sign=if_else(f.y>=0, "positive", "negative")) >> ggplot(aes(x='x', y='y')) + theme_classic() + geom_line(aes(color='sign'), size=1.2)) https://preview.redd.it/w0hs4m8fyf771.png?width=697&format=png&auto=webp&s=eadd7473a9e3393c2d58531c0b2b12f849c27e5e Easy to integrate with other libraries import klib from pipda import register_verb from datar.datasets import iris from datar.dplyr import pull dist_plot = register_verb(func=klib.dist_plot) iris >> pull(f.Sepal_Length) >> dist_plot() https://preview.redd.it/w8b8ouagyf771.png?width=892&format=png&auto=webp&s=3cc8f04e63be710f593b2b6128073f65cf7ffaa4 For more detailed and advanced usage, see https://pwwang.github.io/datar/

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • pipda

    A framework for data piping in python

  • I wrote a framework (https://github.com/pwwang/pipda) that fits this situation, and it makes me easier to port those APIs as they are in python. I am not only following the documentation of the original APIs but looking into the R source code of them so that I can recover most parts of them. I wouldn't say it's perfect, due to the difference between the languages, but I would say it the closest and most covered port of dplyr/tidyr and related packages in python.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • datar: the dplyr in python

    2 projects | /r/Python | 25 Jun 2021
  • Show HN: Hashquery, a Python library for defining reusable analysis

    1 project | news.ycombinator.com | 23 Apr 2024
  • The Design Philosophy of Great Tables (Software Package)

    7 projects | news.ycombinator.com | 4 Apr 2024
  • Ibis: The portable Python dataframe library

    1 project | news.ycombinator.com | 13 Mar 2024
  • Welcome to 14 days of Data Science!

    1 project | dev.to | 7 Mar 2024