sketch
mito
sketch | mito | |
---|---|---|
20 | 18 | |
2,198 | 2,215 | |
0.9% | 1.0% | |
4.4 | 10.0 | |
3 months ago | 13 days ago | |
Python | Python | |
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
sketch
-
Ask HN: What have you built with LLMs?
We've made a lot of data tooling things based on LLMs, and are in the process of rebranding and launching our main product.
1. sketch (in notebook, ai for pandas) https://github.com/approximatelabs/sketch
2. datadm (open source, "chat with data", with support for the open source LLMs (https://github.com/approximatelabs/datadm)
3. Our main product: julyp. https://julyp.com/ (currently under very active rebrand and cleanup) -- but a "chat with data" style app, with a lot of specialized features. I'm also streaming me using it (and sometimes building it) every weekday on twitch to solve misc data problems (https://www.twitch.tv/bluecoconut)
For your next question, about the stack and deploy:
-
Pandas AI – The Future of Data Analysis
This morning I added a "Related Projects" [3] Section to the Buckaroo docs. If Buckaroo doesn't solve your problem, look at one of the other linked projects (like Mito).
[1] https://github.com/approximatelabs/sketch
[2] https://github.com/paddymul/buckaroo
[3] https://buckaroo-data.readthedocs.io/en/latest/FAQ.html
-
Ask HN: What's your favorite GPT powered tool?
For GPT/Copilot style help for pandas, in notebooks REPL flow (without needing to install plugins), I built sketch. I genuinely use it every-time I'm working on pandas dataframes for a quick one-off analysis. Just makes the iteration loop so much faster. (Specifically the `.sketch.howto`, anecdotally I actually don't use `.sketch.ask` anymore)
https://github.com/approximatelabs/sketch
-
RasaGPT: First headless LLM chatbot built on top of Rasa, Langchain and FastAPI
https://github.com/approximatelabs/lambdaprompt It has served all of my personal use-cases since making it, including powering `sketch` (copilot for pandas) https://github.com/approximatelabs/sketch
Core things it does: Uses jinja templates, does sync and async, and most importantly treats LLM completion endpoints as "function calls", which you can compose and build structures around just with simple python. I also combined it with fastapi so you can just serve up any templates you want directly as rest endpoints. It also offers callback hooks so you can log & trace execution graphs.
All together its only ~600 lines of python.
I haven't had a chance to really push all the different examples out there, but most "complex behaviors", so there aren't many patterns to copy. But if you're comfortable in python, then I think it offers a pretty good interface.
I hope to get back to it sometime in the next week to introduce local-mode (eg. all the open source smaller models are now available, I want to make those first-class)
-
[D] The best way to train an LLM on company data
Please look at sketch and langchain pandas/SQL plugins. I have seen excellent results with both of these approaches. Both of these approaches will require you to send metadata to openAI.
-
Meet Sketch: An AI code Writing Assistant For Pandas
👉 Understand your data through questions 👉 Create code from plain text Quick Read: https://www.marktechpost.com/2023/02/01/meet-sketch-an-ai-code-writing-assistant-for-pandas/ Github: https://github.com/approximatelabs/sketch
-
Replacing a SQL analyst with 26 recursive GPT prompts
(3) Asking for re-writes of failed queries (happens occasionally) also helps
The main challenge I think with a lot of these "look it works" tools for data applications, is how do you get an interface that actually will be easy to adopt. The chat-bot style shown here (discord and slack integration) I can see being really valuable, as I believe there has been some traction with these style integrations with data catalog systems recently. People like to ask data questions to other people in slack, adding a bot that tries to answer might short-circuit a lot of this!
We built a prototype where we applied similar techniques to the pandas-code-writing part of the stack, trying to help keep data scientists / data analysts "in flow", integrating the code answers in notebooks (similar to how co-pilot puts suggestions in-line) -- and released https://github.com/approximatelabs/sketch a little while ago.
-
FLiP Stack Weekly for 21 Jan 2023
Python AI Helper https://github.com/approximatelabs/sketch
- LangChain: Build AI apps with LLMs through composability
- Show HN: Sketch – AI code-writing assistant that understands data content
mito
-
The Design Philosophy of Great Tables (Software Package)
2. The report you're sending out for display is _expected_ in an Excel format. The two main reasons for this are just organizational momentum, or that you want to let the receiver conduct additional ad-hoc analysis (Excel is best for this in almost every org).
The way we've sliced this problem space is by improving the interfaces that users can use to export formatting to Excel. You can see some of our (open-core) code here [2]. TL;DR: Mito gives you an interface in Jupyter that looks like a spreadsheet, where you can apply formatting like Excel (number formatting, conditional formatting, color formatting) - and then Mito automatically generates code that exports this formatting to an Excel. This is one of our more compelling enterprise features, for decision makers that work with non-expert Python programmers - getting formatting into Excel is a big hassle.
[1] https://trymito.io
[2] https://github.com/mito-ds/mito/blob/dev/mitosheet/mitosheet...
-
What codegen is (actually) good for
3. So you do want to do code-gen, does it make sense to do it in a chat interface, or can we do better?
As a Figma user, I'd answer these in the following way:
> Why is it necessary to generate code in the first place?
Because mockups aren't your production website, and your production website is written in code. But maybe this is just for now?
I'm sure some high-up PM at Figma has this as their goal - mockup the website in Figma, it generates the code for a website (you don't see this code!), and then you can click deploy _so easily_. Who wants to bet that hosting services like Vercel etc reach out to Figma once a week to try and pitch them...
In the meantime, while we have websites that don't fit neatly inside Figma constraints, while developers are easier to hire than good designers (in my experience), while no-code tools are continually thought of as limiting and a bad long-term solution -- Figma code export is good.
> Why is just writing the code by the hand not the best solution?
For the majority of us full-stack devs who have written >0 CSS but are less than masters, I'll leave this as self-evident.
> So you do want to do code-gen, does it make sense to do it in a chat interface, or can we do better?
In the case of Figma, if they were a new startup with no existing product and they were trying to "automation UI creation" -- v1 of their interface probably would be a "describe your website" and then we'll generate the code for it.
This would probably suck. What if you wanted to easily tweak the output? What if you had trouble describing what you wanted, but you could draw it (ok, OpenAI vision might help on this one)? What if you had experience with existing design tools you could use to augment the AI. A chat interface is not the best interface for design work.
ChatGPT-style code-generation is like v0.1. Github Copilot is an example of next step - it's not just a chat interface, it's something a bit more integrated into an environment that make sense in the context of the work you're doing. For design work, a canvas (literally! [2]) like Figma is well-suited as an environment for code-gen that can augment (and maybe one day replace) the programmers working on frontend. For tabular data work, we think a spreadsheet is the interface where users want to be, and the interface it makes sense to bring code-gen to.
Any thoughts appreciated!
[1] https://trymito.io, https://github.com/mito-ds/mito
-
Pandas AI – The Future of Data Analysis
I think the biggest area for growth for LLM based tools for data analysis is around helping users _understand what edits they actually made_.
I'm a co-founder of a non-AI data code-gen tool for data analysis -- but we also have a basic version of an LLM integration. The problem we see with tooling like Pandas AI (in practice! with real users at enterprises!) is that users make an edit like "remove NaN values" and then get a new dataframe -- but they have no way of checking if the edited dataframe is actually what they want. Maybe the LLM removed NaN values. Maybe it just deleted some random rows!
The key here: how can users build an understanding of how their data changed, and confirm that the changes made by the LLM are the changes they wanted. In other words, recon!
We've been experimenting more with this recon step in the AI flow (you can see the final PR here: https://github.com/mito-ds/monorepo/pull/751). It takes a similar approach to the top comment (passing a subset of the data to the LLM), and then really focuses in the UI around "what changes were made." There's a lot of opportunity for growth here, I think!
Any/all feedback appreciated :)
-
The hand-picked selection of the best Python libraries and tools of 2022
Mito — spreadsheet inside notebooks
- I made an open source spreadsheet that turns your edits into Python
-
I made a tool that turns Excel into Python
You can see the open source code here.
-
I made a Spreadsheet for Python beginners that writes Python for you
Here is the Github again.
-
Learn Python through your Spreadsheet Skills
Mito is an open source Python package that allows the user to call an interactive spreadsheet into their Python environment. Each edit made in the spreadsheet generates the equivalent Python.
- A Spreadsheet for Data Science that Writes Python for Every Edit
-
Mito lets you write Python by editing a spreadsheet
Mito is an open source Python tool that allows you to call a spreadsheet into your Python environment. Each edit you make in the spreadsheet generates the equivalent Python for you. This allows users to access Python with the spreadsheet skills they already have. Here is the Github
What are some alternatives?
RasaGPT - 💬 RasaGPT is the first headless LLM chatbot platform built on top of Rasa and Langchain. Built w/ Rasa, FastAPI, Langchain, LlamaIndex, SQLModel, pgvector, ngrok, telegram
qgrid - An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks
lmql - A language for constraint-guided and efficient LLM programming.
Mage - 🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
gpt_index - LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data. [Moved to: https://github.com/jerryjliu/llama_index]
appsmith - Platform to build admin panels, internal tools, and dashboards. Integrates with 25+ databases and any API.
pandas-ai - Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
dtale - Visualizer for pandas data structures
langchain - ⚡ Building applications with LLMs through composability ⚡ [Moved to: https://github.com/langchain-ai/langchain]
budibase - Budibase is an open-source low code platform that helps you build internal tools in minutes 🚀
rasa - 💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
lux - Automatically visualize your pandas dataframe via a single print! 📊 💡