Using GPT-4 Vision with Vimium to browse the web

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • vimGPT

    Browse the web with GPT-4V and Vimium

  • It's insane that this is now possible:

    https://github.com/ishan0102/vimGPT/blob/682b5e539541cd6d710...

    > "You need to choose which action to take to help a user do this task: {objective}. Your options are navigate, type, click, and done. Navigate should take you to the specified URL. Type and click take strings where if you want to click on an object, return the string with the yellow character sequence you want to click on, and to type just a string with the message you want to type. For clicks, please only respond with the 1-2 letter sequence in the yellow box, and if there are multiple valid options choose the one you think a user would select. For typing, please return a click to click on the box along with a type with the message to write. When the page seems satisfactory, return done as a key with no value. You must respond in JSON only with no other fluff or bad things will happen. The JSON keys must ONLY be one of navigate, type, or click. Do not return the JSON inside a code block."

  • GPT-V-on-Web

    👀🧠 GPT-4 Vision x 💪⌨️ Vimium = Autonomous Web Agent

  • Omg I also just released something pretty similar earlier today https://github.com/Jiayi-Pan/GPT-V-on-Web. But it received little attention.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • CogVLM

    a state-of-the-art-level open visual language model | 多模态预训练模型

  • There are open source models such as https://github.com/THUDM/CogVLM and https://github.com/haotian-liu/LLaVA.

  • LLaVA

    [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

  • There are open source models such as https://github.com/THUDM/CogVLM and https://github.com/haotian-liu/LLaVA.

  • adept-inference

    Inference code for Persimmon-8B

  • Ah, very similar to Adept’s[1] concept? Though, their product seems not yet ready.

    [1] https://www.adept.ai/

  • vim-agent

  • I think vim is unintentionally a great “embodiment” for chatgpt. There’s nothing that can’t be done with a stream of text, and the internet is full of vimscript already

    I started a similar experiment if anyone else is thinking along the same lines :)

    https://github.com/LachlanGray/vim-agent

  • BrowserBox

    🌀 Browse the web from a browser you run on a server, rather than on your local device. Lightweight virtual browser. For security, privacy and more! By https://github.com/dosyago

  • Hi jimmySixDOF thank you for the kind words and the attention on our project! :)

    Regarding pricing we have heard that feedback over time and gradually adjusted our licensing costs. It should now be much more affordable as it is targeted towards large deployments, with decreasing cost and increasing value at scale.

    If you'd like to send an email with any thoughts on our current prices on https://dosyago.com to [email protected] I'd highly value it!

    Your idea of WebXR and embedding within Unity is very interesting, and I think it could be a fit.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • OpenAdapt

    AI-First Process Automation with Large [Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

  • This type of use case is exactly why are building https://github.com/OpenAdaptAI/OpenAdapt

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts