WizardLM
LocalAI
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
WizardLM
- FLaNK AI-April 22, 2024
-
Refact LLM: New 1.6B code model reaches 32% HumanEval and is SOTA for the size
This is interesting work, and a good contribution, but there is no need to mislead people.
[1] https://github.com/nlpxucan/WizardLM
-
Continue with LocalAI: An alternative to GitHub's Copilot that runs everything locally
If you pair this with the latest WizardCoder models, which have a fairly better performance than the standard Salesforce Codegen2 and Codegen2.5, you have a pretty solid alternative to GitHub Copilot that runs completely locally.
- WizardCoder context?
- The world's most-powerful AI model suddenly got 'lazier' and 'dumber.' A radical redesign of OpenAI's GPT-4 could be behind the decline in performance.
-
Official WizardLM-13B-V1.1 Released! Train with Only 1K Data! Can Achieve 86.32% on AlpacaEval!
(We will update the demo links in our github.)
-
GPT-4 API general availability
In terms of speed, we're talking about 140t/s for 7B models, and 40t/s for 33B models on a 3090/4090 now.[1] (1 token ~= 0.75 word) It's quite zippy. llama.cpp performs close on Nvidia GPUs now (but they don't have a handy chart) and you can get decent performance on 13B models on M1/M2 Macs.
You can take a look at a list of evals here: https://llm-tracker.info/books/evals/page/list-of-evals - for general usage, I think home-rolled evals like llm-jeopardy [2] and local-llm-comparison [3] by hobbyists are more useful than most of the benchmark rankings.
That being said, personally I mostly use GPT-4 for code assistance to that's what I'm most interested in, and the latest code assistants are scoring quite well: https://github.com/abacaj/code-eval - a recent replit-3b fine tune the human-eval results for open models (as a point of reference, GPT-3.5 gets 60.4 on pass@1 and 68.9 on pass@10 [4]) - I've only just started playing around with it since replit model tooling is not as good as llamas (doc here: https://llm-tracker.info/books/howto-guides/page/replit-mode...).
I'm interested in potentially applying reflexion or some of the other techniques that have been tried to even further increase coding abilities. (InterCode in particular has caught my eye https://intercode-benchmark.github.io/)
[1] https://github.com/turboderp/exllama#results-so-far
[2] https://github.com/aigoopy/llm-jeopardy
[3] https://github.com/Troyanovsky/Local-LLM-comparison/tree/mai...
[4] https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder
-
WizardLM-13B-V1.0-Uncensored
You talking about this? https://github.com/nlpxucan/WizardLM
-
What 7b llm to use
The smallest model that is close to competent at code is WizardCoder 15B.. https://github.com/nlpxucan/WizardLM/
-
16-Jun-2023
WizardCoder: Empowering Code Large Language Models with Evol-Instruct (https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder)
LocalAI
- LocalAI: Self-hosted OpenAI alternative reaches 2.14.0
- Drop-In Replacement for ChatGPT API
- Voxos.ai – An Open-Source Desktop Voice Assistant
- Ask HN: Set Up Local LLM
- FLaNK Stack Weekly 11 Dec 2023
- Is there any open source app to load a model and expose API like OpenAI?
-
What do you use to run your models?
If you're running this as a server, I would recommend LocalAI https://github.com/mudler/LocalAI
-
OpenAI Switch Kit: Swap OpenAI with any open-source model
LocalAI can do that: https://github.com/mudler/LocalAI
https://localai.io/features/openai-functions/
-
"ChatGPT romanesc"
De inspirație, LocalAI, un replacement la OpenAI. E deja hot pe GitHub.
-
Local LLM's to run on old iMac / Hardware
Your hardware should be fine for inferencing, as long as you don't bother trying to get the GPU working.
My $0.02 would be to try getting LocalAI running on your machine with OpenCL/CLBlas acceleration for your CPU. If you're running other things, you could limit the inferencing process to 2 or 3 threads. That should get it working; I've been able to inference even 13b models on cheap Rockchip SOCs. Your CPU should be fine, even if it's a little outdated.
LocalAI: https://github.com/mudler/LocalAI
Some decent models to start with:
TinyLlama (extremely small/fast): https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v0.3-GGU...
Dolphin Mistral (larger size, better responses: https://huggingface.co/TheBloke/dolphin-2.1-mistral-7B-GGUF
What are some alternatives?
private-gpt - Interact with your documents using the power of GPT, 100% privately, no data leaks
gpt4all - gpt4all: run open-source LLMs anywhere
llm-humaneval-benchmarks
ollama - Get up and running with Llama 3, Mistral, Gemma, and other large language models.
exllama - A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
llama-cpp-python - Python bindings for llama.cpp
airoboros - Customizable implementation of the self-instruct paper.
promptfoo - Test your prompts, models, and RAGs. Catch regressions and improve prompt quality. LLM evals for OpenAI, Azure, Anthropic, Gemini, Mistral, Llama, Bedrock, Ollama, and other local & private models with CI/CD integration.
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
can-ai-code - Self-evaluating interview for AI coders
FastChat - An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.