Ask HN: Is anybody getting value from AI Agents? How so?

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

plandex

12 8,826 9.7 Go

An AI coding engine for building complex, real-world software with LLMs

I'm working on an agent-based tool for software development. I'm getting quite a lot of value out of it. The intention is to minimize copy-pasting and work on complex, multi-file features that are too large for ChatGPT, Copilot, and other AI development tools I've tried.
https://github.com/plandex-ai/plandex
It's working quite well though I am still working out some kinks.
I think the key to agents that really work is understanding the limitations of the models and working around them rather than trying to do everything with the LLM.
In the context of software development, imo we are currently at the stage of developer-AI symbiosis and probably will be for some time. We aren't yet at the stage where it makes sense to try to get an agent to code and debug complex tasks end-to-end. Trying to do this is a recipe for burning lots of tokens and spending more time and than it would take to build something yourself. But if you follow the 80/20 rule and get the AI to the bulk of the work, intervening frequently to keep it on track and then polishing it at the end, huge productivity gains are definitely in reach.

rbenv

68 15,797 5.6 Shell

Manage your app's Ruby environment

When I was technical blogging on how to learn from open-source code [1], I used it quite frequently to get unstuck and/or to figure out how to tease apart a large question into multiple smaller functions. For example, I had no idea how to break up this long `sed` command [2] into its constituent parts, so I plugged it into ChatGPT and asked it to break down the code for me. I then Googled the different parts to confirm that ChatGPT wasn't leading me astray.
If I had asked StackOverflow the same question, it would have been quickly closed as being not broadly applicable enough (since this `sed` command is quite specific to its use case). After ChatGPT broke the code apart for me, I was able to ask StackOverflow a series of more discrete, more broadly-applicable questions and get a human answer.
TL;DR- I quite like ChatGPT as a search engine when "you don't know what you don't know", and getting unblocked means being pointed in the right direction.
1. https://www.richie.codes/shell
2. https://github.com/rbenv/rbenv/blob/e8b7a27ee67a5751b899215b...

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
nolita

1 47 9.1 TypeScript

A full-stack framework for building agentic applications

Full disclaimer up top: I have been working on agents for about a year now building what would eventually become HDR [1][2].
The first issue is that agents have extremely high failure rates. Agents really don't have the capacity to learn from either success or failure since their internal state is fixed after training. If you ask an agent to repeatedly do some task it has a chance of failing every single time. We have been able to largely mitigate this by modeling agentic software as a state machine. At every step we have the model choose the inputs to the state machine and then we record them. We then 'compile' the resulting state-transition table down into a program that we can executed deterministically. This isn't totally fool proof since the world state can change between program runs, so we have methods that allow the LLM to make slight modifications to the program as needed. The idea here is that agents should never have to solve the same problem twice. The cool thing about this approach is that smarter models make the entire system work better. If you have a particularly complex task, you can call out to gp4-turbo or claude3-opus to map out the correct action sequence and then fall back to less complex models like mistral 7b.
The second issue is that almost all software is designed for people, not LLMs. What is intuitive for human users may not be intuitive for non-human users. We're focused on making agents reliably interact with the internet so I'll use web pages as an example. Web pages contain tons of visually encoded information in things like the layout hierarchy, images, etc. But most LLMs rely on purely text inputs. You can try exposing the underling HTML or the DOM to the model, but this doesn't work so well in practice. We get around this by treating LLMs as if they were visually impaired users. We give them a purely text interface by using ARIA trees. This interface is much more compact than either the DOM or HTML so responses come back faster and cost way less.
The third issue I see with people building agents is they go after the wrong class of problem. I meet a lot of people who want to use agents for big ticket items such as planning an entire trip + doing all the booking. The cost of a trip can run into the thousands of dollars and be a nightmare to undo if something goes wrong. You really don't want to throw agents at this kind of problem, at least not yet, because the downside to failure is so high. Users generally want expensive things to be done well and agents can't do that yet.
However there are a ton of things I would like someone to do for me that would cost less than five dollars of someones time and the stakes for things going wrong are low. My go to example is making reservations. I really don't want to spend the time sorting through the hundreds of nearby restaurants. I just want to give something the general parameters of what I'm looking for and have reservations show up in my inbox. These are the kinds of tasks that agents are going to accelerate.
[1] https://github.com/hdresearch/hdr-browser

nagato-ai

1 7 5.9 Python

Simple cross-LLM AI Agent library

I get the same feeling. AI Agents sounds very cool but reliability is a huge issue right now.
The fact that you can get vastly different outcomes for similar runs (even while using Claude 3 Opus with tool/function calling) can drive you insane. I read somewhere down in this thread that one way to mitigate these problems is my implementing a robust state machine. I reckon this can help, but I also believe that somehow leveraging memory from previous runs could be useful too. It's not fully clear in my mind how to go about doing this.
I'm still very excited about the space though. It's a great place to be and I love the energy but also measured enthusiasm from everyone who is trying to push the boundaries of what is possible with agents.
I'm currently also tinkering with my own Python AI Agent library to further my understanding of how they work: https://github.com/kenshiro-o/nagato-ai . I don't expect it to become the standard but it's good fun and a great learning opportunity for me :).

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project