-
evals
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
To save others the trouble, I googled Voyager, it's pretty interesting. I had no idea an LLM could do this sort of thing:
https://voyager.minedojo.org/
"What if" is all these "existential risk" conversations ever are.
Where is your evidence that we're approaching human level AGI, let alone SuperIntelligence? Because ChatGPT can (sometimes) approximate sophisticated conversation and deep knowledge?
How about some evidence that ChatGPT isn't even close? Just clone and run OpenAI's own evals repo https://github.com/openai/evals on the GPT-4 API.
It performs terribly on novel logic puzzles and exercises that a clever child could learn to do in an afternoon (there are some good chess evals, and I submitted one asking it to simulate a Forth machine).
Other examples(in the real world) you might find interesting.
https://tidybot.cs.princeton.edu/