can-ai-code vs developer

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

www.scoutapm.com

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

can-ai-code		developer
	Project
30	Mentions	37
471	Stars	11,700
-	Growth	0.5%
9.5	Activity	7.2
9 days ago	Latest Commit	2 months ago
Python	Language	Python
MIT License	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

can-ai-code

Posts with mentions or reviews of can-ai-code. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-01-29.

Ask HN: Code Llama 70B on a dedicated server
1 project | news.ycombinator.com | 1 Mar 2024

You can run a Q4 quant of a 70B model in about 40GB of RAM (+context). You're single user (batch size 1, bs=1) inference speed will be basically memory bottlenecked, so on a dual channel dedicated box you'd expect somewhere about 1 token/s. That's inference, prefill/prompt generation will take even longer (as your chat history grows) on CPU. So falls into the realm of technically possible, but not for real world use.
If you're looking specifically for CodeLlama 70B, Artificial Analysis https://artificialanalysis.ai/models/codellama-instruct-70b/... lists Perplexity, Together.ai, Deep Infra, and Fireworks as potential hosts, with Together.ai and Deepinfra at about $0.9/1M tokens, with about 30 tokens/s and about 300ms latency (time to first token).
For those looking for local coding models in specifically. I keep a list of LLM coding evals here: https://llm-tracker.info/evals/Code-Evaluation
On the EvalPlus Leaderboard, there about about 10 open models that rank higher than CodeLlama 70B, all smaller models: https://evalplus.github.io/leaderboard.html
A few other evals (worth cross-referencing to counter contamination, overfitting):
* CRUXEval Leaderboard https://crux-eval.github.io/leaderboard.html
* CanAiCode Leaderboard https://huggingface.co/spaces/mike-ravkine/can-ai-code-resul...
* Big Code Models Leaderboard https://huggingface.co/spaces/bigcode/bigcode-models-leaderb...
From the various leaderboards, deepseek-ai/deepseek-coder-33b-instruct still looks like the best performing open model (it has a very liberal ethical license), followed by ise-uiuc/Magicoder-S-DS-6.7B (a deepseek-coder-6.7b-base fine tune). The former can be run as a Q4 quant on a single 24GB GPU (a used 3090 should run you about $700 atm), and the latter, if it works for you will run 4X faster and fit on even cheaper/weaker GPUs.
There's always recent developments, but two worth pointing out:
OpenCodeInterpreter - a new system that uses execution feedback and outperforms ChatGPT4 Code Interpreter that is fine-tuned off of the DeepSeek code models: https://opencodeinterpreter.github.io/
StarCoder2-15B just dropped and also looks competitive. Announcement and relevant links: https://huggingface.co/blog/starcoder2
Meta AI releases Code Llama 70B
6 projects | news.ycombinator.com | 29 Jan 2024

This is a completely fair, but open question. Not to be a typical HN user, but when you say SOTA local, the question is really what benchmarks do you really care about in order to evaluate. Size, operability, complexity, explainability etc.
Working out what copilot models perform best has been a deep exercise for myself and has really made me evaluate my own coding style on what I find important and things I look out for when investigating models and evaluating interview candidates.
I think three benchmarks & leaderboards most go to are:
https://huggingface.co/spaces/bigcode/bigcode-models-leaderb... - which is the most understood, broad language capability leaderboad that relies on well understood evaluations and benchmarks.
https://huggingface.co/spaces/mike-ravkine/can-ai-code-resul... - Also comprehensive, but primarily assesses Python and JavaScript.
https://evalplus.github.io/leaderboard.html - which I think is a better take on comparing models you intend to run locally as you can evaluate performance, operability and size in one visualisation.
Best of luck and I would love to know which models & benchmarks you choose and why.
Stable Code 3B: Coding on the Edge
7 projects | news.ycombinator.com | 16 Jan 2024

Here is a leader board of some models
https://huggingface.co/spaces/mike-ravkine/can-ai-code-resul...
Don't know how biased this leaderboard is, but I guess you could just give some of them a try and see for yourself.
Mistral has an even more powerfull model in the prototype-phase
1 project | /r/LocalLLaMA | 11 Dec 2023

- Can AI Code? - https://huggingface.co/spaces/mike-ravkine/can-ai-code-results
Assessing llms for code generation.
1 project | /r/LocalLLaMA | 5 Dec 2023

Check out https://github.com/the-crypt-keeper/can-ai-code for some ideas. I'd love to see more shootouts like this. Especially if they were spread out among a few different languages.
Show HN: LlamaGPT – Self-hosted, offline, private AI chatbot, powered by Llama 2
12 projects | news.ycombinator.com | 16 Aug 2023

Very cool, this looks like a combination of chatbot-ui and llama-cpp-python? A similar project I've been using is https://github.com/serge-chat/serge. Nous-Hermes-Llama2-13b is my daily driver and scores high on coding evaluations (https://huggingface.co/spaces/mike-ravkine/can-ai-code-resul...).
How Is LLaMa.cpp Possible?
11 projects | news.ycombinator.com | 15 Aug 2023

I have several sets of quant comparisons posted on my HF spaces, the caveat is my prompts are all "English to code": https://huggingface.co/spaces/mike-ravkine/can-ai-code-compa...
The dropdown at the top selects which comparison: Falcon compares GGML, Vicuna compares bits and bytes. I have some more comparisons planned, feel free to open an issue if you'd like to see something specific: https://github.com/the-crypt-keeper/can-ai-code
Ask HN: Who is using small OS LLMs in production?
2 projects | news.ycombinator.com | 2 Aug 2023

Yeah it seemed suspiciously high for HumanEval and it only ranks 14th for JS and 7th for Python on other benchmarks now: https://huggingface.co/spaces/mike-ravkine/can-ai-code-resul...
WizardCoder is a bit of a problem since it's not llama 1/2 based but is its own 15B model and as such the support for it in anything practical is near nonexistent. WizardLM v1.2 looks like it may be worth checking out.
Recent updates on the LLM Explorer (15,000+ LLMs listed)
1 project | /r/LocalLLaMA | 12 Jul 2023

There are at least 4 different types of quants floating around HF (bitsandbytes, GGML, GPTQ and AWQ) so I dont know if a "GGML" column makes sense vs a more abstract way of linking quants to their base models. I am doing this and its fucking awful: https://github.com/the-crypt-keeper/can-ai-code/blob/main/models/models.yaml
Did anyone try to benchmark LLM's for coding against each other and against proprietary ones like Copilot X?
2 projects | /r/LocalLLaMA | 5 Jul 2023

Ah I meant this one but I see now it's WIP.

developer

Posts with mentions or reviews of developer. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-01-31.

DeepSeek Coder: Let the Code Write Itself
3 projects | news.ycombinator.com | 31 Jan 2024

> much of the work is repetitive, but it comes with its edge cases that we need to look out for
Then don't use AI for it.
Bluntly.
This is a poor use-case; it doesn't matter what model you use, you'll get a disappointing result.
These are the domains where using AI coding currently shines:
1) You're approaching a new well established domain (eg. building an android app in kotlin), and you already know how to build things / apps, but not specifically that exact domain.
Example: How do I do X but for an android app in kotlin?
2) You're building out a generic scaffold for a project and need some tedious (but generic) work done.
Example: https://github.com/smol-ai/developer
3) You have a standard, but specific question regarding your code, and although related Q/A answers exist, nothing seems to specifically target the issue you're having.
Example: My nginx configuration is giving me [SPECIFIC ERROR] for [CONFIG FILE]. What's wrong and how can I fix it?
The domains where it does not work are:
1) You have some generic code with domain/company/whatever specific edge cases.
The edge cases, broadly speaking, no matter how well documented, will not be handled well by the model.
Edge cases are exactly that; edge cases; the common medium of 'how to x' does not cover edge cases; the edge cases will not be covered and the results will require you to review and complete them manually.
2) You have some specific piece of code you want to refactor 'to solve xxx', but the code is not covered well by tests.
LLMs struggle to refactor existing code, and the difficulty is proportional to the code length. There are technical reasons for this (mainly randomizing token weights), but tldr; it's basically a crap shot.
Might work. Might not. If you have no tests who knows? You have to manually verify both the new functionality and the old functionality, but maybe it helps a bit, at scale, for trivial problems.
3) You're doing some obscure BS or using a new library / new version of the library.
The LLM will have no context for this, and will generate rubbish / old deprecated content.
...
So. Concrete advice:
1) sigh~
> a friend of mine came and suggested that I use Retrieval-Augmented Generation (RAG), I have yet to try it, with a setup Langchain + Ollama.
Ignore this advice. RAG and langchain are not the solutions you are looking for.
2) Use a normal coding assistant like copilot.
This is the most effective way to use AI right now.
There are some frameworks that let you use open source models if you don't want to use openAI.
3) Do not attempt to bulk generate code.
AI coding isn't at that level. Right now, the tooling is primitive, and large scale coherent code generation is... not impossible, but it is difficult (see below).
You will be more effective using an existing proven path that uses 'copilot' style helpers.
However...
...if you do want to pursue code generation, here's a broad blueprint to follow:
- decompose your task into steps
- decompose you steps in functions
- generate or write tests and function definitions
- generate an api specification (eg. .d.ts file) for your function definitions
- for each function definition, generate the code for the function passing the api specification in as the context. eg. "Given functions x, y, z with the specs... ; generate an implementation of q that does ...".
- repeated generate multiple outputs for the above until you get one that passes the tests you wrote.
This approach broadly scales to reasonably complex problems, so long as you partition your problem into module sized chunks.
I personally like to put something like "you're building a library/package to do xxx" or "as a one file header" as a top level in the prompt, as it seems to link into the 'this should be isolated and a package' style of output.
Did I accidentally automate myself out of the job?
1 project | /r/OpenAI | 1 Dec 2023

check out smol-developer (https://github.com/smol-ai/developer)
Ask HN: How can ChatGPT be effectively utilized in the work
4 projects | news.ycombinator.com | 17 Oct 2023

4. https://github.com/smol-ai/developer
How can ChatGPT be effectively utilized for reading library source code, resolving coding issues, and serving as a dedicated coding assistant tailored for a specific programming language?
Bootstrap a React app with smol developer
1 project | dev.to | 26 Sep 2023

The smol developer AI tool was built by a developer called Swyx using ChatGPT. This library is designed to act like a personal, junior developer, performing a huge array of simple, routine tasks as well as some sophisticated tasks. By using a spec that you provide in a prompt, you can even use smol developer to pair program with an AI tool!
Outsmarting AI 🤖🧠 The hack for generating fully-functional web apps
5 projects | dev.to | 22 Aug 2023

And this is where most of these tools fall short, with tools like Smol-Developer creating decent client and server code that work great on their own, but unfortunately don’t work together!
Ask HN: Which GPT-powered coding assistants exist?
4 projects | news.ycombinator.com | 6 Aug 2023

1) Show HN: Bloop – Answer questions about your code with an LLM agent (github.com/bloopai)
https://news.ycombinator.com/item?id=36260961
2) https://github.com/paul-gauthier/aider
3) Show HN: GPT Repo Loader – load entire code repos into GPT prompts (github.com/mpoon)
https://news.ycombinator.com/item?id=35191303
4) https://github.com/smol-ai/developer
5) codium
6) copilot
7) using gpt in the playground / chatgpt
8) jam.dev/jamgpt
9) magic.dev
10) https://github.com/kristoferlund/duet-gpt
Which ones am I missing?
How to add an AI Code Copilot to your product using GPT4
4 projects | news.ycombinator.com | 4 Aug 2023

I had this same idea and started working on something for this purpose called j-dev [0]. It started as a fork off smol-dev [1] which basically gets GPT to write your entire project from scratch. And then you would have to iterate the prompt to nuke everything and re-write everything, filling in increasingly complicated statements like "oh except in this function make sure you return a promise"
j-dev is basically a CLI where it gives a prompt similar to the one in the parent article. You start with a prompt and the CLI fills in the directory contents (excluding gitignore). Then it requests access to the files it thinks it wants. And then it can edit, delete or add files or ask for followup based on your response.
It also addresses the problem that a lot of these tools eat up way too many tokens so a single prompt to something like smol-dev would eat up a few dollars on every iterations.
It's still very much a work in progress and i'll prob do a show hn next week but I would love some feedback
[0] https://github.com/breeko/j-dev
[1] https://github.com/smol-ai/developer
Smol AI 🐣 vs Wasp AI 🐝- Which is the Better AI Junior Developer?
2 projects | dev.to | 1 Aug 2023

Smol AI’s “Smol-Developer” gained a lot of notoriety very quickly by being one of the first such tools on the scene. It is a simple set of python scripts that allow a user to build prototype apps using natural language in an iterative approach.
Ai create entire project
3 projects | /r/ChatGPTPro | 10 Jul 2023
In five years, there will be no programmers left, believes Stability AI CEO
4 projects | /r/singularity | 3 Jul 2023

What are some alternatives?

When comparing can-ai-code and developer you can also consider the following projects:

llm-humaneval-benchmarks

gpt-engineer - Specify what you want it to build, the AI asks for clarification, and then builds it.

WizardLM - Family of instruction-following LLMs powered by Evol-Instruct: WizardLM, WizardCoder and WizardMath

sweep - Sweep: open-source AI-powered Software Developer for small features and bug fixes.

ollama - Get up and running with Llama 3, Mistral, Gemma, and other large language models.

aider - aider is AI pair programming in your terminal

openchat - OpenChat: Advancing Open-source Language Models with Imperfect Data

refact - WebUI for Fine-Tuning and Self-hosting of Open-Source Large Language Models for Coding

Local-LLM-Comparison-Colab-UI - Compare the performance of different LLM that can be deployed locally on consumer hardware. Run yourself with Colab WebUI.

gpt-pilot - The first real AI developer

text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

MetaGPT - 🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming

can-ai-code vs llm-humaneval-benchmarks developer vs gpt-engineer can-ai-code vs WizardLM developer vs sweep can-ai-code vs ollama developer vs aider can-ai-code vs openchat developer vs refact can-ai-code vs Local-LLM-Comparison-Colab-UI developer vs gpt-pilot can-ai-code vs text-generation-webui developer vs MetaGPT

Scout Monitoring - Free Django app performance insights with Scout Monitoring

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

www.scoutapm.com

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

Compare can-ai-code vs developer and see what are their differences.

can-ai-code

developer

can-ai-code

developer

What are some alternatives?