-
Pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
pandas-ai
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
-
buckaroo
Buckaroo - the data wrangling assistant for pandas. Quickly explore dataframes, and run pandas commands via a GUI. Works inside the jupyter notebook.
This morning I added a "Related Projects" [3] Section to the Buckaroo docs. If Buckaroo doesn't solve your problem, look at one of the other linked projects (like Mito).
[1] https://github.com/approximatelabs/sketch
[2] https://github.com/paddymul/buckaroo
[3] https://buckaroo-data.readthedocs.io/en/latest/FAQ.html
I asked GPT-4 this
can you visit https://pandas.pydata.org/about/governance.html and tell me if I am allowed to use the term 'pandas' in the name of another unaffiliated project, for example 'pandas-ai'
--
Based on the BSD 3-Clause License under which pandas is released, neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission[ 1] . This means that to use the term 'pandas' in the name of another unaffiliated project such as 'pandas-ai', you would likely need to get written permission from the pandas project's copyright holders.
However, please note that this is not legal advice, and it would be a good idea to consult with a lawyer who specializes in open-source software or intellectual property law to ensure that you're in compliance with all legal requirements.
[1] https://github.com/pandas-dev/pandas/blob/main/LICENSE
I think the biggest area for growth for LLM based tools for data analysis is around helping users _understand what edits they actually made_.
I'm a co-founder of a non-AI data code-gen tool for data analysis -- but we also have a basic version of an LLM integration. The problem we see with tooling like Pandas AI (in practice! with real users at enterprises!) is that users make an edit like "remove NaN values" and then get a new dataframe -- but they have no way of checking if the edited dataframe is actually what they want. Maybe the LLM removed NaN values. Maybe it just deleted some random rows!
The key here: how can users build an understanding of how their data changed, and confirm that the changes made by the LLM are the changes they wanted. In other words, recon!
We've been experimenting more with this recon step in the AI flow (you can see the final PR here: https://github.com/mito-ds/monorepo/pull/751). It takes a similar approach to the top comment (passing a subset of the data to the LLM), and then really focuses in the UI around "what changes were made." There's a lot of opportunity for growth here, I think!
Any/all feedback appreciated :)
The medium article is ok, if blocked at times. This is just a summary, not by the package author.
You can jump to the code at https://github.com/gventuri/pandas-ai to see more of what it's trying to do.
This morning I added a "Related Projects" [3] Section to the Buckaroo docs. If Buckaroo doesn't solve your problem, look at one of the other linked projects (like Mito).
[1] https://github.com/approximatelabs/sketch
[2] https://github.com/paddymul/buckaroo
[3] https://buckaroo-data.readthedocs.io/en/latest/FAQ.html