deep-swe
duckduckgo-locales
| deep-swe | duckduckgo-locales | |
|---|---|---|
| 11 | 2,381 | |
| 101 | 112 | |
| 0.0% | 1.8% | |
| - | 9.9 | |
| 22 days ago | 4 days ago | |
| Shell | Perl | |
| - | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
deep-swe
-
AWS Bedrock to require sharing data with Anthropic for Mythos and future models
That remains to be seen.
It's notable that Anthropic are still using SWEBench as a coding benchmark rather that the newer more difficult DeepSWE which shows them well behind GPT 5.5
https://deepswe.datacurve.ai/
Bear in mind that all the marketing efforts such as solving Erdos problem are the result of concerted RL training to impart those narrow capabilities, and how much of any benchmark results, or paid shill vibe reports, reflect improved performance for more general real-world use cases remains to be seen.
-
DeepSeek V4 Pro beats GPT-5.5 Pro on precision
This benchmark draws a very different picture having GPT5.5 on the very top with 70% and DeepSeek at 8%
https://deepswe.datacurve.ai
- DeepSWE results are unreliable – 3/3 DSv4 "failed" tasks solved with same model
- DeepSWE: Measuring frontier coding agents on original, long-horizon SWE tasks
- DeepSWE Audit: DeepSeek-v4-pro results are unreliable
-
DeepSWE: More and cheaper intelligence from maxed GPT 5.5 than maxed Opus 4.8
Source: https://deepswe.datacurve.ai
Just select the two models from the drop down.
-
Claude Opus 4.8
Where did you get that idea? It uses mini-swe-agent, same as SWE-Bench.
https://github.com/datacurve-ai/deep-swe
- DeepSWE: Measuring coding agents on original, long-horizon engineering tasks
- DeepSWE Measuring frontier coding agents
duckduckgo-locales
-
Policy on the AI Exponential
> except to the extent that dumb mistakes might result in danger
That "except" goes all the way up to starting WW3. Or a leak from a viral research lab, and by "leak" I mean "mail order" and by "research lab" I mean "the companies who already ship custom DNA and RNA retroviruses": https://duckduckgo.com/?q=companies+who+already+ship+custom+...
If you can prove that simply not training on horror stories would work, it would make a lot of people very happy.
Unfortunately, I don't think it does nothing to solve, for example, Elon Musk just plain asking some future version of Grok to take over the world for him.
Nor would merely failing to include them in traing data stop certain entire fictional scenarios such as that Doctor Who episode where the android repair bots weren't told that the crew were off-limits as spare parts, or the other Doctor Who episode where the utilitarian robots started killing everyone who was upset because they calculated net positive utility from upset people ceasing to exist.
- DuckDuckGo displays a special logo when you search for FreeBSD
-
"They're made out of weights"
No, please. EXPLAIN
wtf does this mean, in the very precise, very meaningful, so clear and direct German?
https://duckduckgo.com/?t=ffab&q=%22Die+Zeitlichkeit+zeitigt...
-
United Airlines 767 Returns to Newark After Bluetooth Name Sparks Alert
It's also not the only one. There are at least like a dozen of them: https://duckduckgo.com/?ia=images&q=bomb+bluetooth+speaker
Including by HAMA which is a decently big peripheral / hardware brand in Europe.
(Of course, I don't know if they all have bomb as a device name. But I'm sure some do.)
-
Claude Opus 4.8
As an aside, some of the renders have only a single side connection to the wheel and that is a valid bike design, the Cannondale Lefty front fork only has a left leg:
https://duckduckgo.com/?q=cannondale+lefty&iar=images&t=ffab
- Jony Ive's Ferrari
- Search engines alternatives now that Google isn't Google anymore
- Yt-dlp – [Announcement] Bun support is now limited and deprecated
-
AI is killing the cheap smartphone
A new addition to my shill list - Ulefone. Rugged, heavy on features and (still) reasonably priced. Pics:https://duckduckgo.com/?ia=images&origin=funnel_home_website...
Other underappreciated+awesome handset brands: Doogee, Blackview
- A Nicer Voltmeter Clock
What are some alternatives?
arena-ai-leaderboards - 📊 Daily auto-updated snapshots of all Arena AI (LMSYS Chatbot Arena) leaderboards — LLM, Vision, Code, Video, Image & more. Structured JSON with historical tracking.
torsocks - Library to torify application - NOTE: upstream has been moved to https://gitweb.torproject.org/torsocks.git