-
hallucination-leaderboard
Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents
-
autogen
A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
SuperAGI
<⚡️> SuperAGI - A dev-first open source autonomous AI agent framework. Enabling developers to build, manage & run useful autonomous agents quickly and reliably.
Excited about GPT4-Turbo and longer sequence lengths. Looking forward very much for faster inference. We just released Vectara's "Hallucination Evaluation Model" (aka HEM) today https://huggingface.co/vectara/hallucination_evaluation_mode..., with a leaderboard: https://github.com/vectara/hallucination-leaderboard
When I saw this, I figured it was a clear step back from simply using plugins in the main ChatGPT view. It's basically plugins, but with extra prompting and you can only use one at a time.
But if you look at projects like Autogen ( https://github.com/microsoft/autogen ), you see one master agent coordinating other agents which have a narrower scope. But you have to create the prompts for the agents yourself.
This GPTs setup will crowd-source a ton of data on creating agents which serve a single task. Then they can train a model that's very good at creating other agents. Altman nods to this immediately after the GPTs feature is shown, repeating that OpenAI does not train on API usage.
Prediction: next year's dev day, which Altman hints will make today's conference look "quaint" by comparison, will basically be an app built around the autogen concept, but which you can spin up using very simple prompts to complete very complex tasks. Probably on top of a mixture of GPT 4 & 5.