OpenAdapt
skyvern
OpenAdapt | skyvern | |
---|---|---|
28 | 8 | |
681 | 5,236 | |
30.5% | 34.4% | |
9.3 | 9.6 | |
3 days ago | 1 day ago | |
Python | Python | |
MIT License | GNU Affero General Public License v3.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
OpenAdapt
-
Llama 3-V: Matching GPT4-V with a 100x smaller model and 500 dollars
Our initial testing suggests MiniCPM outperforms InternVL for GUI understanding: https://github.com/OpenAdaptAI/OpenAdapt/issues/637#issuecom...
(InternVL appears to hallucinate more.)
-
Why MSFT Copilot+ and AI PCs are the final nail in the coffin of open computing
We have Linux support on the roadmap in https://github.com/OpenAdaptAI/OpenAdapt.
OpenAdapt has similar functionality, except:
- it's open source
- it only records when you explicitly tell it to
- it has multiple PII/PHI scrubbing providers built in (see https://github.com/OpenAdaptAI/OpenAdapt?tab=readme-ov-file#...)
- the purpose for recording is to automate tasks in desktop apps
- it's cross platform (Mac and Windows now, Linux coming soon)
Full disclosure: I'm the primary author. Feedback welcome!
-
PaliGemma: Open-Source Multimodal Model by Google
Excited to test how this performs compared to MiniCPMv2, especially when analyzing GUI images: https://github.com/OpenAdaptAI/OpenAdapt/issues/637
-
Show HN: Tarsier – vision for text-only LLM web agents that beats GPT-4o
Congratulations on shipping!
In https://github.com/OpenAdaptAI/OpenAdapt/blob/main/openadapt... we use FastSAM to first segment the UI elements, then have the LLM describe each segment individually. This seems to work quite well; see https://twitter.com/OpenAdaptAI/status/1789430587314336212 for a demo.
More coming soon!
- GPT-4o
- Rabbit R1 can be run on a Android device
- OpenAdapt: AI-First Process Automation with Large Multimodal Models
- Adapter between LMMs and traditional desktop and web GUI
-
I Witnessed the Future of AI, and It's a Broken Toy
> Rabbit has said the device will be able to learn any app, if you teach it.
We're building this over at https://github.com/OpenAdaptAI/OpenAdapt. OpenAdapt learns to automate tasks in desktop apps by observing human demonstrations.
Early demo: https://twitter.com/abrichr/status/1784307190062342237 (more coming soon!)
The demo is overly simplistic to keep it short -- it also works with arbitrary applications and operations.
Also, we're open source. Contributions and feedback are welcome and encouraged :)
-
Memary is a cutting-edge long-term memory system based on a knowledge graph
Very interesting, thank you for making this available!
At OpenAdapt (https://github.com/OpenAdaptAI/OpenAdapt) we are looking into using pm4py (https://github.com/pm4py) to extract a process graph from a recording of user actions.
I will look into this more closely. In the meantime, could the authors share their perspective on whether Memary could be useful here?
skyvern
-
ScrapeGraphAI: Web scraping using LLM and direct graph logic
https://github.com/Skyvern-AI/skyvern
This is pretty much what we're building at Skyvern. The only problem is that inference cost is still a little bit too high for scraping, but we expect that to change in the next year
-
Show HN: Skyvern – open-source browser automation tool
This is a great point. This is something already on our roadmap. We call it "prompt caching", but I realize writing this that it's a terrible name. Will update! (https://github.com/Skyvern-AI/Skyvern?tab=readme-ov-file#fea...)
Thank you for this feedback
-
LaVague: Open-source Large Action Model to automate Selenium browsing
We're also working in the space and just open sourced Skyvern
https://github.com/Skyvern-AI/Skyvern
What are some alternatives?
ios-mail - Secure email that protects your privacy
LaVague - Large Action Model framework to develop AI Web Agents
CogVLM - a state-of-the-art-level open visual language model | 多模态预训练模型
browserpilot - Natural language browser automation
LLaVA - [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
vimGPT - Browse the web with GPT-4V and Vimium
adept-inference - Inference code for Persimmon-8B
tarsier - Vision utilities for web interaction agents 👀
IfcOpenShell - Open source IFC library and geometry engine
mitta-community - Community repository for MittaAI users.
strawberry - A GraphQL library for Python that leverages type annotations 🍓
html2text - Convert HTML to Markdown-formatted text.