Pandas AI – The Future of Data Analysis

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • sketch

    AI code-writing assistant that understands data content

  • This morning I added a "Related Projects" [3] Section to the Buckaroo docs. If Buckaroo doesn't solve your problem, look at one of the other linked projects (like Mito).

    [1] https://github.com/approximatelabs/sketch

    [2] https://github.com/paddymul/buckaroo

    [3] https://buckaroo-data.readthedocs.io/en/latest/FAQ.html

  • Pandas

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

  • I asked GPT-4 this

    can you visit https://pandas.pydata.org/about/governance.html and tell me if I am allowed to use the term 'pandas' in the name of another unaffiliated project, for example 'pandas-ai'

    --

    Based on the BSD 3-Clause License under which pandas is released, neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission[ 1] . This means that to use the term 'pandas' in the name of another unaffiliated project such as 'pandas-ai', you would likely need to get written permission from the pandas project's copyright holders.

    However, please note that this is not legal advice, and it would be a good idea to consult with a lawyer who specializes in open-source software or intellectual property law to ensure that you're in compliance with all legal requirements.

    [1] https://github.com/pandas-dev/pandas/blob/main/LICENSE

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • mito

    The mitosheet package, trymito.io, and other public Mito code.

  • I think the biggest area for growth for LLM based tools for data analysis is around helping users _understand what edits they actually made_.

    I'm a co-founder of a non-AI data code-gen tool for data analysis -- but we also have a basic version of an LLM integration. The problem we see with tooling like Pandas AI (in practice! with real users at enterprises!) is that users make an edit like "remove NaN values" and then get a new dataframe -- but they have no way of checking if the edited dataframe is actually what they want. Maybe the LLM removed NaN values. Maybe it just deleted some random rows!

    The key here: how can users build an understanding of how their data changed, and confirm that the changes made by the LLM are the changes they wanted. In other words, recon!

    We've been experimenting more with this recon step in the AI flow (you can see the final PR here: https://github.com/mito-ds/monorepo/pull/751). It takes a similar approach to the top comment (passing a subset of the data to the LLM), and then really focuses in the UI around "what changes were made." There's a lot of opportunity for growth here, I think!

    Any/all feedback appreciated :)

  • pandas-ai

    Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

  • The medium article is ok, if blocked at times. This is just a summary, not by the package author.

    You can jump to the code at https://github.com/gventuri/pandas-ai to see more of what it's trying to do.

  • pandasql

    sqldf for pandas

  • buckaroo

    Buckaroo - the data wrangling assistant for pandas. Quickly explore dataframes, and run pandas commands via a GUI. Works inside the jupyter notebook.

  • This morning I added a "Related Projects" [3] Section to the Buckaroo docs. If Buckaroo doesn't solve your problem, look at one of the other linked projects (like Mito).

    [1] https://github.com/approximatelabs/sketch

    [2] https://github.com/paddymul/buckaroo

    [3] https://buckaroo-data.readthedocs.io/en/latest/FAQ.html

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • The Design Philosophy of Great Tables (Software Package)

    7 projects | news.ycombinator.com | 4 Apr 2024
  • Welcome to 14 days of Data Science!

    1 project | dev.to | 7 Mar 2024
  • Read files from s3 using Pandas/s3fs or AWS Data Wrangler?

    3 projects | /r/dataengineering | 6 Dec 2023
  • What codegen is (actually) good for

    2 projects | news.ycombinator.com | 28 Sep 2023
  • Data Science for Beginners - A Curriculum

    1 project | /r/programming | 8 Sep 2023