-
arena-ai-leaderboards
📊 Daily auto-updated snapshots of all Arena AI (LMSYS Chatbot Arena) leaderboards — LLM, Vision, Code, Video, Image & more. Structured JSON with historical tracking.
https://arena.ai/leaderboard - I’ve found this company is a pretty good ranker - not sure their exact methodology but during day to day programming with Claude / gpt models I’ve felt qualitatively what they report
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Where did you get that idea? It uses mini-swe-agent, same as SWE-Bench.
https://github.com/datacurve-ai/deep-swe
-
They just (minutes ago) updated the "What's new in Opus 4.8" documentation: https://platform.claude.com/docs/en/about-claude/models/what...
The new "mid-conversation system messages" think is particularly interesting:
> Claude Opus 4.8 accepts role: "system" messages immediately after a user turn in the messages array (subject to placement rules). This lets you append updated instructions later in a long-running conversation without restating the full system prompt, which preserves prompt cache hits on the earlier turns and reduces input cost on agentic loops. No beta header is required. See Mid-conversation system messages for usage details.
Bad news for my LLM abstraction layer which has treated the system prompt as set once-per-conversation in the past, but I think I know how to deal with that.
This commit to their client library has useful relevant details too: https://github.com/anthropics/anthropic-sdk-python/commit/2b...
-
claude-code-system-prompts
All parts of Claude Code's system prompt, 27 builtin tool descriptions, sub agent prompts (Plan/Explore/Task), utility prompts (CLAUDE.md, compact, statusline, magic docs, WebFetch, Bash cmd, security review, agent creation). Updated for each Claude Code version.
It's interesting that (for example for the explore agent https://github.com/Piebald-AI/claude-code-system-prompts/blo... ) they use a personality "you are a file search specialist" and "your strengths" framing. I thought that was largely thought to be useless, or even counterproductive nowadays? Does anyone know more about this stuff?
-
As an aside, some of the renders have only a single side connection to the wheel and that is a valid bike design, the Cannondale Lefty front fork only has a left leg:
https://duckduckgo.com/?q=cannondale+lefty&iar=images&t=ffab
Related posts
-
The Best LLMs for Agentic Coding in 2026 (Real-World, Not Just Benchmarks)
-
How to Write a CLAUDE.md Rule That Actually Gets Enforced
-
What Happens When You Evaluate a B2B Sales Agent on Tasks It Was Never Designed For
-
Letting Claude Code's Routines continuously tune my CLI's performance
-
Show HN: We benchmarked 18 LLMs on OCR (7K+ calls) – cheaper models win