-
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
AutoGPT
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Check out AutoGPT you don't have to wait, it's already built.
https://github.com/Significant-Gravitas/AutoGPT
-
clickclickclick
A framework to enable autonomous android and computer use using any LLM (local or remote)
Training a base model just for computer use seems like an overkill as normal reasoning model like o3 for planning + a vision model like gemini-flash is good enough[1] without being trained specifically for computer use.
But if you still want to try out this path, Google has made the screenQA dataset(rico) available[2] along with bounding boxes.
1. A framework to use local/hosted models for android use/control - https://github.com/BandarLabs/clickclickclick
2. https://github.com/google-research-datasets/screen_qa
-
screen_qa
ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K question-answer pairs collected by human annotators for ~35K screenshots from Rico. It should be used to train and evaluate models capable of screen content understanding via question answering.
Training a base model just for computer use seems like an overkill as normal reasoning model like o3 for planning + a vision model like gemini-flash is good enough[1] without being trained specifically for computer use.
But if you still want to try out this path, Google has made the screenQA dataset(rico) available[2] along with bounding boxes.
1. A framework to use local/hosted models for android use/control - https://github.com/BandarLabs/clickclickclick
2. https://github.com/google-research-datasets/screen_qa
Related posts
-
I built "ClickClickClick" 👆that lets me use/control phones 📱using plain text 💬
-
Skyvern Browser Agent 2.0: How We Reached State of the Art in Evals
-
What we learned copying all the best code assistants
-
Coconut by Meta AI – Better LLM Reasoning with Chain of Continuous Thought?
-
Click3 – Taking control of your phone using Gemini/OpenAI