ragrank
promptfoo
ragrank | promptfoo | |
---|---|---|
1 | 21 | |
23 | 3,238 | |
- | 17.7% | |
9.5 | 9.9 | |
21 days ago | 3 days ago | |
Python | TypeScript | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
ragrank
-
I created Ragrank 🎯- An open source ecosystem to evaluate LLM and RAG.
Feel free to contribute on GitHub 💚
promptfoo
- Iterate on LLMs Faster
- Google CodeGemma: Open Code Models Based on Gemma [pdf]
- AI Infrastructure Landscape
- Promptfoo – Testing and Evaluation for LLMs
-
Show HN: Prompt-Engineering Tool: AI-to-AI Testing for LLM
Super interesting. We've been experimenting with [promptfoo](https://github.com/promptfoo/promptfoo) at my work, and this looks very similar.
- GitHub – promptfoo/promptfoo: Test your prompts
-
I asked 60 LLMs a set of 20 questions
In case anyone's interested in running their own benchmark across many LLMs, I've built a generic harness for this at https://github.com/promptfoo/promptfoo.
I encourage people considering LLM applications to test the models on their _own data and examples_ rather than extrapolating general benchmarks.
This library supports OpenAI, Anthropic, Google, Llama and Codellama, any model on Replicate, and any model on Ollama, etc. out of the box. As an example, I wrote up an example benchmark comparing GPT model censorship with Llama models here: https://promptfoo.dev/docs/guides/llama2-uncensored-benchmar.... Hope this helps someone.
- Ask HN: Prompt Manager for Developers
- DeepEval – Unit Testing for LLMs
- Show HN: Knit – A Better LLM Playground
What are some alternatives?
shap-e - Generate 3D objects conditioned on text or images
prompt-engineering - Tips and tricks for working with Large Language Models like OpenAI's GPT-4.
litellm - Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
WizardLM - Family of instruction-following LLMs powered by Evol-Instruct: WizardLM, WizardCoder and WizardMath
ChainForge - An open-source visual programming environment for battle-testing prompts to LLMs.
chat-ui - Open source codebase powering the HuggingChat app
WizardVicunaLM - LLM that combines the principles of wizardLM and vicunaLM
agenta - The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.
evals - Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
openplayground - An LLM playground you can run on your laptop
sparsegpt - Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
Auto-GPT-MetaTrader-Plugin - The AutoGPT MetaTrader Plugin is a software tool that enables traders to connect their MetaTrader 4 or 5 trading account to Auto-GPT.