SaaSHub helps you find the best software and product alternatives Learn more β
SWE-bench Alternatives
Similar projects and alternatives to SWE-bench
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
-
llama-recipes
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
-
-
-
-
-
-
-
granite-snack-cookbook
Granite Snack Cookbook -- easily consumable recipes (python notebooks) that showcase the capabilities of the Granite models
-
-
-
Amphion
Amphion (/Γ¦mΛfaΙͺΙn/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
-
monocle
Monocle is a framework for tracing GenAI app code. This repo contains implementation of Monocle for GenAI apps written in Python. (by monocle2ai)
-
-
-
SWE-bench discussion
SWE-bench reviews and mentions
-
Show HN: Agent Benchmark Repository and Viewer
We have created a public registry of AI agent benchmarks and agent runtime traces to help everyone better understand how AI agents work and fail these days.
Our team and many agent builders we talked to wanted a better way of viewing what agents in these benchmarks do, e.g., how a particular coding agent approaches SWE-bench (https://www.swebench.com/). Right now, there are two reasons why this is difficult: benchmark traces are distributed on many different websites, and they are really hard to read. Often, they are huge raw JSON dumps of the agent in formats that are hard to read.
-
Show HN: LLM Function Calling Library to Interact with File, Shell, Git and Code
- Codebase Q&A Agent: Enables natural language interactions with the codebase.
To better showcase the SWE kit's capability, we tested it on [swebench.com](https://www.swebench.com/), the benchmark for testing coding agents. It scored 48.60%, whereas Devin scored only 13.86%.
If you end up using this, please do provide feedback and if you need help building coding agent feel free to reach out to us
I (Soham) & my cofounder Karan are both active on this thread to answer any questions!
-
AIM Weekly for 04Nov2024
π Composed Image Retrieval π Intro to Multimodal LLama 3.2 π οΈ Multi Agent Concierge π» RAG with Langchain Granite, Milvus π«Ά Download content β Transformer Replacement? π€ vLLM for runing models π Amphion π Autogluon π Notebook LLama like Google's Notebook LLM π«Ά Monocle2ai for tracing GenAI app code LFA&D Project π€ Bee Agent Framework β LLama RFP Response βΆοΈ GenAI Script π½ Simular AI Agent S π¦Ύ DrawDB with AI β¨ Ollama with LLama 3.2 Vision!!!! Preview π Powerful RAG Checker π SQL Generator π» Role of LLMs π Document Extraction πΆοΈ Open Source Vector DB Reddit π The Practical Guide to Self Hosting LLM π¦Ύ Stagehand Controller πΆοΈ Understanding HNSWLIB π Best practices in RAG π» Enigma Agent π Langchain, Ollama, Phi3 for Function Calling π Compass Judger π Princeton NLP SimPO π Princeton NLP ProLong π Princeton NLP HELMET π§ Ollama Cheatsheet π Princeton NLP CopyCat π Princeton NLP Shp πΆοΈ Can LLM Solve Hard Github Issues π Enabling Large Language Models to Generate Text with Citations π Princeton NLP CharXiv π Awesome AI Agents List π¦Ύ Nomicβs Matryoshka text embedding model
-
A note from our sponsor - SaaSHub
www.saashub.com | 5 Dec 2024
Stats
princeton-nlp/SWE-bench is an open source project licensed under MIT License which is an OSI approved license.
The primary programming language of SWE-bench is Python.