R1 Computer Use

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. r1-computer-use

    Applying the ideas of Deepseek R1 to computer use

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. AutoGPT

    AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

    Check out AutoGPT you don't have to wait, it's already built.

    https://github.com/Significant-Gravitas/AutoGPT

  4. clickclickclick

    A framework to enable autonomous android and computer use using any LLM (local or remote)

    Training a base model just for computer use seems like an overkill as normal reasoning model like o3 for planning + a vision model like gemini-flash is good enough[1] without being trained specifically for computer use.

    But if you still want to try out this path, Google has made the screenQA dataset(rico) available[2] along with bounding boxes.

    1. A framework to use local/hosted models for android use/control - https://github.com/BandarLabs/clickclickclick

    2. https://github.com/google-research-datasets/screen_qa

  5. screen_qa

    ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K question-answer pairs collected by human annotators for ~35K screenshots from Rico. It should be used to train and evaluate models capable of screen content understanding via question answering.

    Training a base model just for computer use seems like an overkill as normal reasoning model like o3 for planning + a vision model like gemini-flash is good enough[1] without being trained specifically for computer use.

    But if you still want to try out this path, Google has made the screenQA dataset(rico) available[2] along with bounding boxes.

    1. A framework to use local/hosted models for android use/control - https://github.com/BandarLabs/clickclickclick

    2. https://github.com/google-research-datasets/screen_qa

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • I built "ClickClickClick" 👆that lets me use/control phones 📱using plain text 💬

    1 project | dev.to | 18 Jan 2025
  • Skyvern Browser Agent 2.0: How We Reached State of the Art in Evals

    2 projects | news.ycombinator.com | 17 Jan 2025
  • What we learned copying all the best code assistants

    1 project | news.ycombinator.com | 4 Jan 2025
  • Coconut by Meta AI – Better LLM Reasoning with Chain of Continuous Thought?

    1 project | news.ycombinator.com | 31 Dec 2024
  • Click3 – Taking control of your phone using Gemini/OpenAI

    1 project | news.ycombinator.com | 19 Dec 2024