Launch HN: Sweep (YC S23) – A bot to create simple PRs in your codebase

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • sweep

    Sweep: open-source AI-powered Software Developer for small features and bug fixes.

  • Hi HN! We’re William and Kevin, cofounders of Sweep (https://sweep.dev/). Sweep is an open-source AI-powered junior developer. You describe a feature or bugfix in a GitHub issue and Sweep writes a pull request with code. You can see some examples here: https://docs.sweep.dev/examples.

    Kevin and I met while working at Roblox. We talked to our friends who were junior developers and noticed a lot of them doing grunt work. We wanted to let them focus on important work. Copilot is great, but we realized some tasks could be completely offloaded to an AI (e.g. adding a banner to your webpage https://github.com/sweepai/landing-page/issues/225).

    Sweep does this with a code search engine. We use code chunking, ranking, and formatting tricks to represent your codebase in a token-efficient manner for LLMs. You might have seen our blog on code chunking here: https://news.ycombinator.com/item?id=36948403.

    We take these fetched code snippets and come up with a plan to write the PR. We found that having the LLM provide structured information using XML tags is very robust, as it’s easy for us to parse with regex, has good support for multi-line answers and is hard for the LLM to mess up.

    This is because XML is common in the LLM’s training data (the internet / HTML), and the opening and closing tags rarely appear naturally in text and code, unlike the quotations, brackets, backticks and newlines used by JSON’s and markdown’s delimiters. Further, XML lets you skip the preamble (“This question has to do with xyz. Here is my answer:”) and handles multi-line answers like PR plans and code really well. For example, we ask the LLM for the new code in tags and a final boolean answer by writing True.

    We use this XML format to get the LLM to create a plan, generating a list of files to create and modify from the retrieved relevant files. We iterate through the file changes and edit/create the necessary files. Finally, we push the commits to GitHub and create the PR.

    We’ve been using Sweep to handle small issues in Sweep’s own repo (it recently passed 100 commits). We’ve become well acquainted with its limitations. For example, Sweep sometimes leave unimplemented functions with just “# rest of code” since it runs on GPT-4, a model tuned for chatting. Other times, there’s minor syntax errors or undefined variables. This is why we spend the other half of our time building self-recovery methods for Sweep to fix and test its PRs.

    First, we invite the developer to review and add comments to Sweep’s pull request. This helps to a point, but Sweep’s code sometimes wouldn’t lint. This is table stakes. It’s frustrating to have to tell the bot to “add an import here” or “this variable is undefined”.

  • landing-page

  • Hi HN! We’re William and Kevin, cofounders of Sweep (https://sweep.dev/). Sweep is an open-source AI-powered junior developer. You describe a feature or bugfix in a GitHub issue and Sweep writes a pull request with code. You can see some examples here: https://docs.sweep.dev/examples.

    Kevin and I met while working at Roblox. We talked to our friends who were junior developers and noticed a lot of them doing grunt work. We wanted to let them focus on important work. Copilot is great, but we realized some tasks could be completely offloaded to an AI (e.g. adding a banner to your webpage https://github.com/sweepai/landing-page/issues/225).

    Sweep does this with a code search engine. We use code chunking, ranking, and formatting tricks to represent your codebase in a token-efficient manner for LLMs. You might have seen our blog on code chunking here: https://news.ycombinator.com/item?id=36948403.

    We take these fetched code snippets and come up with a plan to write the PR. We found that having the LLM provide structured information using XML tags is very robust, as it’s easy for us to parse with regex, has good support for multi-line answers and is hard for the LLM to mess up.

    This is because XML is common in the LLM’s training data (the internet / HTML), and the opening and closing tags rarely appear naturally in text and code, unlike the quotations, brackets, backticks and newlines used by JSON’s and markdown’s delimiters. Further, XML lets you skip the preamble (“This question has to do with xyz. Here is my answer:”) and handles multi-line answers like PR plans and code really well. For example, we ask the LLM for the new code in tags and a final boolean answer by writing True.

    We use this XML format to get the LLM to create a plan, generating a list of files to create and modify from the retrieved relevant files. We iterate through the file changes and edit/create the necessary files. Finally, we push the commits to GitHub and create the PR.

    We’ve been using Sweep to handle small issues in Sweep’s own repo (it recently passed 100 commits). We’ve become well acquainted with its limitations. For example, Sweep sometimes leave unimplemented functions with just “# rest of code” since it runs on GPT-4, a model tuned for chatting. Other times, there’s minor syntax errors or undefined variables. This is why we spend the other half of our time building self-recovery methods for Sweep to fix and test its PRs.

    First, we invite the developer to review and add comments to Sweep’s pull request. This helps to a point, but Sweep’s code sometimes wouldn’t lint. This is table stakes. It’s frustrating to have to tell the bot to “add an import here” or “this variable is undefined”.

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • landing-

  • To make this better, we used GitHub Actions, which automatically runs the flow of “check the code → tell sweep → sweep fixes the code → check the code again”. We like this flow because you might already have GitHub Actions, and it’s fully configurable. Check out this blog to learn more https://docs.sweep.dev/blogs/giving-dev-tools.

    So far, Sweep isn’t that fast, can’t handle massive problems yet, and doesn’t write hundreds of lines of code. We’re excited to work towards that. In the meantime, a lot of our users have been able to get useful results. For example, a user reported that an app was not working correctly on Windows, and Sweep wrote the PR at https://github.com/sweepai/sweep/pull/368/files, replacing all occurrences of "/tmp" with "tempfile.gettempdir()". Other examples include adding a validation function for Github branch name (https://github.com/sweepai/sweep/pull/461) and adding dynamically generated initials in the testimonials on our landing page (https://github.com/wwzeng1/landing- page/issues/28). For more examples, checkout https://docs.sweep.dev/examples.

    Our focus is on finding ways that an AI dev can actually help and not just be a novelty. I think of my daily capacity to write good code as a stamina bar. There’s a fixed cost to opening an IDE, finding the right lines of code, and making changes. If you’re working on a big feature and have to context switch, the cost is higher. I’ve been leaving the small changes to Sweep, and my stamina bar stays full for longer.

    Our repo is at https://github.com/sweepai/sweep, there’s a demo video at https://www.youtube.com/watch?v=WBVna_ow8vo, and you can install Sweep here: https://github.com/apps/sweep-ai. We currently have a freemium model, with 5 GPT-4 PRs at the free tier, 120 GPT-4 PRs at the paid tier and unlimited at the enterprise tier.

    We’re far from our vision of a full AI software engineer, but we’re excited to work on it with the community feedback :). Looking forward to hearing any of your thoughts!

  • sweep-ai

  • To make this better, we used GitHub Actions, which automatically runs the flow of “check the code → tell sweep → sweep fixes the code → check the code again”. We like this flow because you might already have GitHub Actions, and it’s fully configurable. Check out this blog to learn more https://docs.sweep.dev/blogs/giving-dev-tools.

    So far, Sweep isn’t that fast, can’t handle massive problems yet, and doesn’t write hundreds of lines of code. We’re excited to work towards that. In the meantime, a lot of our users have been able to get useful results. For example, a user reported that an app was not working correctly on Windows, and Sweep wrote the PR at https://github.com/sweepai/sweep/pull/368/files, replacing all occurrences of "/tmp" with "tempfile.gettempdir()". Other examples include adding a validation function for Github branch name (https://github.com/sweepai/sweep/pull/461) and adding dynamically generated initials in the testimonials on our landing page (https://github.com/wwzeng1/landing- page/issues/28). For more examples, checkout https://docs.sweep.dev/examples.

    Our focus is on finding ways that an AI dev can actually help and not just be a novelty. I think of my daily capacity to write good code as a stamina bar. There’s a fixed cost to opening an IDE, finding the right lines of code, and making changes. If you’re working on a big feature and have to context switch, the cost is higher. I’ve been leaving the small changes to Sweep, and my stamina bar stays full for longer.

    Our repo is at https://github.com/sweepai/sweep, there’s a demo video at https://www.youtube.com/watch?v=WBVna_ow8vo, and you can install Sweep here: https://github.com/apps/sweep-ai. We currently have a freemium model, with 5 GPT-4 PRs at the free tier, 120 GPT-4 PRs at the paid tier and unlimited at the enterprise tier.

    We’re far from our vision of a full AI software engineer, but we’re excited to work on it with the community feedback :). Looking forward to hearing any of your thoughts!

  • synclinear.com

    End-to-end sync of Linear and GitHub.

  • Yup! We used to use https://synclinear.com/ but you can also use Zapier to automatically redirect Linear issues to GitHub. It's a nice experience since we also had a Discord to Linear hook.

  • 3dstreet

    🚲🚶🚌 Web-based 3D visualization of streets using A-Frame

  • github-pr-summary

    Use ChatGPT to summarize & review GitHub Pull Requests

  • I'm wondering what will happen if we let ChatGPT review these PRs created by ChatGPT.

    Yes, We made a small tool to help developer review their PR. Seems a great supplement for Sweep AI.

    Build your own PR review bot in 3 minutes here: https://github.com/flows-network/github-pr-summary

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • schemalint

    Lint database schemas

  • I tried Sweep the other day and I got a weird error: https://github.com/kristiandupont/schemalint/issues/286 -- "an issue has occurred around fetching the files." What files might that be?

    In any case, congratulations on launching, your product looks really promising!

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts