SWE-bench

[ICLR 2024] SWE-bench: Can Language Models Resolve Real-world Github Issues? (by princeton-nlp)

SWE-bench Alternatives

Similar projects and alternatives to SWE-bench

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better SWE-bench alternative or higher similarity.

SWE-bench discussion

Log in or Post with

SWE-bench reviews and mentions

Posts with mentions or reviews of SWE-bench. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-11-04.
  • Show HN: Agent Benchmark Repository and Viewer
    1 project | news.ycombinator.com | 26 Nov 2024
    We have created a public registry of AI agent benchmarks and agent runtime traces to help everyone better understand how AI agents work and fail these days.

    Our team and many agent builders we talked to wanted a better way of viewing what agents in these benchmarks do, e.g., how a particular coding agent approaches SWE-bench (https://www.swebench.com/). Right now, there are two reasons why this is difficult: benchmark traces are distributed on many different websites, and they are really hard to read. Often, they are huge raw JSON dumps of the agent in formats that are hard to read.

  • Show HN: LLM Function Calling Library to Interact with File, Shell, Git and Code
    1 project | news.ycombinator.com | 6 Nov 2024
    - Codebase Q&A Agent: Enables natural language interactions with the codebase.

    To better showcase the SWE kit's capability, we tested it on [swebench.com](https://www.swebench.com/), the benchmark for testing coding agents. It scored 48.60%, whereas Devin scored only 13.86%.

    If you end up using this, please do provide feedback and if you need help building coding agent feel free to reach out to us

    I (Soham) & my cofounder Karan are both active on this thread to answer any questions!

  • AIM Weekly for 04Nov2024
    29 projects | dev.to | 4 Nov 2024
    🌐 Composed Image Retrieval πŸ“Ž Intro to Multimodal LLama 3.2 πŸ› οΈ Multi Agent Concierge πŸ’» RAG with Langchain Granite, Milvus 🫢 Download content βœ… Transformer Replacement? πŸ€– vLLM for runing models 🌐 Amphion πŸ“ Autogluon πŸš™ Notebook LLama like Google's Notebook LLM 🫢 Monocle2ai for tracing GenAI app code LFA&D Project πŸ€– Bee Agent Framework βœ… LLama RFP Response ▢️ GenAI Script πŸ‘½ Simular AI Agent S 🦾 DrawDB with AI ✨ Ollama with LLama 3.2 Vision!!!! Preview πŸš• Powerful RAG Checker πŸ“Š SQL Generator πŸ’» Role of LLMs 🐍 Document Extraction πŸ•ΆοΈ Open Source Vector DB Reddit πŸ” The Practical Guide to Self Hosting LLM 🦾 Stagehand Controller πŸ•ΆοΈ Understanding HNSWLIB 🐍 Best practices in RAG πŸ’» Enigma Agent πŸ“ Langchain, Ollama, Phi3 for Function Calling πŸ”‹ Compass Judger πŸ“ Princeton NLP SimPO πŸ” Princeton NLP ProLong πŸ”‹ Princeton NLP HELMET 🧐 Ollama Cheatsheet πŸš• Princeton NLP CopyCat πŸ“Š Princeton NLP Shp πŸ•ΆοΈ Can LLM Solve Hard Github Issues πŸ“ Enabling Large Language Models to Generate Text with Citations πŸ”‹ Princeton NLP CharXiv πŸ“Š Awesome AI Agents List 🦾 Nomic’s Matryoshka text embedding model
  • A note from our sponsor - SaaSHub
    www.saashub.com | 5 Dec 2024
    SaaSHub helps you find the best software and product alternatives Learn more β†’

Stats

Basic SWE-bench repo stats
3
2,041
9.5
11 days ago

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com