Show HN: Skyvern – open-source browser automation tool

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • skyvern

    Automate browser-based workflows with LLMs and Computer Vision

  • https://github.com/Skyvern-AI/skyvern/blob/d0935755963b017ed...

    We also spit out the cost for each step within the visualizer. Click on any task > Steps > there's a column that's dedicated to how much things cost to run

    https://github.com/Skyvern-AI/skyvern/issues/70

    2. We have a roadmap item to "cache" or "memorize" specific tasks, so you pay the cost once, and then just run it over and over again. We're going to get to it soon!!

  • LaVague

    Large Action Model framework to turn natural language into browser actions

  • We're quite different than LaVague. LaVague passes in the entire HTML DOM to the LLM to help it generate XPaths and valid Selenium code. (https://github.com/lavague-ai/LaVague/blob/main/src/lavague/...)

    Try this at your own risk.. any reasonable website would result in extraordinarily high input token costs

    We spend quite a bit of our time building a layer between the HTML and the LLM call to distill important pieces of information down to actions the LLM can take.. better weighing cost vs output. We're still not at 100% coverage.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • self-operating-computer

    A framework to enable multimodal models to operate a computer.

  • This is quite different than https://github.com/OthersideAI/self-operating-computer

    Self-operating-computer uses pixel mapping to control your computer. This is a very good approach, but it's extremely unreliable. GPT-4V frequently hallucinates pixel outputs, causing it to miss interactions, or enter fail-loops

    >The approach by AI Jason

    AI Jason is using image-only methods to interact with the browser. This is a great first step, but this approach tends to be rife with hallucinations or errors. We do dom parsing in addition to image anaylsis to help GPT-4V correlate information in the image to the interactable elements within the DOM. This dramatically boosts its ability to perform the same task over and over again reliably (which proved impossible with the image-only approach)

  • vimGPT

    Browse the web with GPT-4V and Vimium

  • OpenAdapt

    AI-First Process Automation with Large [Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

  • Congratulations on shipping!

    Check out https://github.com/OpenAdaptAI/OpenAdapt for an open source (MIT license) alternative that also works on desktop (including Citrix!)

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts