frouros
uptrain
frouros | uptrain | |
---|---|---|
5 | 35 | |
164 | 2,029 | |
4.9% | 5.1% | |
9.3 | 9.6 | |
6 days ago | 5 days ago | |
Python | Python | |
BSD 3-clause "New" or "Revised" License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
frouros
uptrain
-
Evaluation of OpenAI Assistants
Currently seeking feedback for the developed tool. Would love it if you can check it out on: https://github.com/uptrain-ai/uptrain/blob/main/examples/assistants/assistant_evaluator.ipynb
-
Integrating Spade: Synthesizing Assertions for LLMs into My OSS Project
d. Using an integer programming optimizer to find the optimal evaluation set with maximum coverage and respect failure, accuracy, and subsumption constraints
Their results are impressive. You can look at the SPADE paper for more details: https://arxiv.org/pdf/2401.03038.pdf
2. Running these evaluations reliably is tricky: Recently, using LLMs as evaluators has emerged as a promising alternative to human evaluations and has proven quite effective in improving the accuracy of LLM applications. However, difficulties still exist when running these evals reliably, i.e. high correlation with human judgments and stability across multiple runs. UpTrain is an open-source framework for evaluating LLM applications that provide high-quality scores. It allows one to define custom evaluations via GuidelineAdherence check, where one can determine any custom guideline in plain English and check if the LLM follows it. Additionally, it provides an easy interface to run these evaluations on production responses with a single API call. This allows one to systematically leverage frameworks like UpTrain to check for wrong LLM outputs.
I am one of the maintainers of UpTrain, and we recently integrated the SPADE framework into our open-source repo (https://github.com/uptrain-ai/uptrain/). The idea is simple:
-
Sharing learnings from evaluating Million+ LLM responses
b. Task Dependent: Tonality match with the given persona, creativity, interestingness, etc. Your prompt can play a big role here
3. Evaluating Reasoning Capabilities: Includes dimensions like logical correctness (right conclusions), logical robustness (consistent with minor input changes), logical efficiency (shortest solution path), and common sense understanding (grasping common concepts). One canβt do much beyond prompting techniques like CoT and primarily depends upon the LLM chosen.
4. Custom Evaluations: Many applications require customized metrics tailored to their specific needs. You want adherence to custom guidelines, check for certain keywords, etc.
You can read the full blog here (https://uptrain.ai/blog/how-to-evaluate-your-llm-applications). Hope you find it useful. I am one of the developer of UpTrain - it is an open-source package to evaluate LLM applications (https://github.com/uptrain-ai/uptrain).
Would love to get feedback from the HN community.
- Show HN: UpTrain (YC W23) β open-source tool to evaluate LLM response quality
-
Introducing UpTrain - Open-source LLM evaluator π
Open-source repo: https://github.com/uptrain-ai/uptrain
-
Launching UpTrain - an open-source LLM testing tool to help check the performance of your LLM applications
You can check out the project - https://github.com/uptrain-ai/uptrain and would love to hear feedback from the community
- [P] A Practical Guide to Enhancing Models for Custom Use-cases
-
[D] Any options for using GPT models using proprietary data ?
I am building an open source project which helps in collecting the high quality retraining dataset for fine-tuning LLMs. Check out https://github.com/uptrain-ai/uptrain
-
[D] Should we draw inspiration from Deep learning/Computer vision world for fine-tuning LLMs?
P.S. I am building an open-source project UpTrain (https://github.com/uptrain-ai/uptrain), which helps data scientists to do so. We just wrote a blog on how this principle can be applied to fine-tune an LLM for a conversation summarization task. Check it out here: https://github.com/uptrain-ai/uptrain/tree/main/examples/coversation_summarization
- Show HN: UpTrain β A Practical Approach to Finetuning LLMs for Custom Use-Cases
What are some alternatives?
nannyml - nannyml: post-deployment data science in python
lora - Using Low-rank adaptation to quickly fine-tune diffusion models.
river - π Online machine learning in Python
stanford_alpaca - Code and documentation to train Stanford's Alpaca models, and generate the data.
pytorch-forecasting - Time series forecasting with PyTorch
aim - Aim π« β An easy-to-use & supercharged open-source experiment tracker.
deepchecks - Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test your data and models from research to production.