deep-swe
anthropic-sdk-python
| deep-swe | anthropic-sdk-python | |
|---|---|---|
| 11 | 7 | |
| 101 | 3,619 | |
| 0.0% | 8.6% | |
| - | 9.5 | |
| 22 days ago | 7 days ago | |
| Shell | Python | |
| - | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
deep-swe
-
AWS Bedrock to require sharing data with Anthropic for Mythos and future models
That remains to be seen.
It's notable that Anthropic are still using SWEBench as a coding benchmark rather that the newer more difficult DeepSWE which shows them well behind GPT 5.5
https://deepswe.datacurve.ai/
Bear in mind that all the marketing efforts such as solving Erdos problem are the result of concerted RL training to impart those narrow capabilities, and how much of any benchmark results, or paid shill vibe reports, reflect improved performance for more general real-world use cases remains to be seen.
-
DeepSeek V4 Pro beats GPT-5.5 Pro on precision
This benchmark draws a very different picture having GPT5.5 on the very top with 70% and DeepSeek at 8%
https://deepswe.datacurve.ai
- DeepSWE results are unreliable โ 3/3 DSv4 "failed" tasks solved with same model
- DeepSWE: Measuring frontier coding agents on original, long-horizon SWE tasks
- DeepSWE Audit: DeepSeek-v4-pro results are unreliable
-
DeepSWE: More and cheaper intelligence from maxed GPT 5.5 than maxed Opus 4.8
Source: https://deepswe.datacurve.ai
Just select the two models from the drop down.
-
Claude Opus 4.8
Where did you get that idea? It uses mini-swe-agent, same as SWE-Bench.
https://github.com/datacurve-ai/deep-swe
- DeepSWE: Measuring coding agents on original, long-horizon engineering tasks
- DeepSWE Measuring frontier coding agents
anthropic-sdk-python
-
Claude Opus 4.8
They just (minutes ago) updated the "What's new in Opus 4.8" documentation: https://platform.claude.com/docs/en/about-claude/models/what...
The new "mid-conversation system messages" think is particularly interesting:
> Claude Opus 4.8 accepts role: "system" messages immediately after a user turn in the messages array (subject to placement rules). This lets you append updated instructions later in a long-running conversation without restating the full system prompt, which preserves prompt cache hits on the earlier turns and reduces input cost on agentic loops. No beta header is required. See Mid-conversation system messages for usage details.
Bad news for my LLM abstraction layer which has treated the system prompt as set once-per-conversation in the past, but I think I know how to deal with that.
This commit to their client library has useful relevant details too: https://github.com/anthropics/anthropic-sdk-python/commit/2b...
- Claude prompt-cache writes may not be immediately visible to the next request
- Tokenizer de Claude 4.7: 1.47x mรกs tokens medidos vs Claude 4.6
-
How to Build Your First AI Agent: A Step-by-Step Tutorial
Anthropic Python SDK -- The official Python client for Claude.
-
Using Claude and Llms as Your DevOps & Platform Engineering Assistant
The Anthropic SDK: https://github.com/anthropics/anthropic-sdk-python
-
Generating Python code using Anthropic API for Claude AI
We're using the Anthropic SDK library for Python to access the API:
- Anthropic's Python SDK (safety-first language model APIs)
What are some alternatives?
arena-ai-leaderboards - ๐ Daily auto-updated snapshots of all Arena AI (LMSYS Chatbot Arena) leaderboards โ LLM, Vision, Code, Video, Image & more. Structured JSON with historical tracking.
claude-code-sdk-python - [Moved to: https://github.com/anthropics/claude-agent-sdk-python]
claude-code-system-prompts - All parts of Claude Code's system prompt, 27 builtin tool descriptions, sub agent prompts (Plan/Explore/Task), utility prompts (CLAUDE.md, compact, statusline, magic docs, WebFetch, Bash cmd, security review, agent creation). Updated for each Claude Code version.
mappertec