llm-vscode
promptfoo
llm-vscode | promptfoo | |
---|---|---|
4 | 21 | |
1,143 | 3,100 | |
7.6% | 28.1% | |
6.3 | 9.9 | |
6 days ago | about 17 hours ago | |
TypeScript | TypeScript | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
llm-vscode
- LLM Powered Development for VSCode
-
Google CodeGemma: Open Code Models Based on Gemma [pdf]
https://github.com/huggingface/llm-vscode
"llm.backend": "ollama",
-
Best open-source & local alternatives to GitHub Copilot for data science notebooks
- LLM VS Code from Hugging Face
- Code completion VSCode extension for OSS models
promptfoo
- Google CodeGemma: Open Code Models Based on Gemma [pdf]
- AI Infrastructure Landscape
- Promptfoo – Testing and Evaluation for LLMs
-
Show HN: Prompt-Engineering Tool: AI-to-AI Testing for LLM
Super interesting. We've been experimenting with [promptfoo](https://github.com/promptfoo/promptfoo) at my work, and this looks very similar.
- GitHub – promptfoo/promptfoo: Test your prompts
-
I asked 60 LLMs a set of 20 questions
In case anyone's interested in running their own benchmark across many LLMs, I've built a generic harness for this at https://github.com/promptfoo/promptfoo.
I encourage people considering LLM applications to test the models on their _own data and examples_ rather than extrapolating general benchmarks.
This library supports OpenAI, Anthropic, Google, Llama and Codellama, any model on Replicate, and any model on Ollama, etc. out of the box. As an example, I wrote up an example benchmark comparing GPT model censorship with Llama models here: https://promptfoo.dev/docs/guides/llama2-uncensored-benchmar.... Hope this helps someone.
- Ask HN: Prompt Manager for Developers
- DeepEval – Unit Testing for LLMs
- Show HN: Knit – A Better LLM Playground
- Show HN: CLI for testing and evaluating LLM outputs
What are some alternatives?
jupyter-ai - A generative AI extension for JupyterLab
shap-e - Generate 3D objects conditioned on text or images
prompt-engineering - Tips and tricks for working with Large Language Models like OpenAI's GPT-4.
WizardLM - Family of instruction-following LLMs powered by Evol-Instruct: WizardLM, WizardCoder and WizardMath
chat-ui - Open source codebase powering the HuggingChat app
litellm - Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
ChainForge - An open-source visual programming environment for battle-testing prompts to LLMs.
WizardVicunaLM - LLM that combines the principles of wizardLM and vicunaLM
evals - Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
openplayground - An LLM playground you can run on your laptop
agenta - The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.
sparsegpt - Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".