Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more β
Uptrain Alternatives
Similar projects and alternatives to uptrain
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
deepchecks
Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test your data and models from research to production.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
uptrain reviews and mentions
-
Evaluation of OpenAI Assistants
Currently seeking feedback for the developed tool. Would love it if you can check it out on: https://github.com/uptrain-ai/uptrain/blob/main/examples/assistants/assistant_evaluator.ipynb
-
Integrating Spade: Synthesizing Assertions for LLMs into My OSS Project
d. Using an integer programming optimizer to find the optimal evaluation set with maximum coverage and respect failure, accuracy, and subsumption constraints
Their results are impressive. You can look at the SPADE paper for more details: https://arxiv.org/pdf/2401.03038.pdf
2. Running these evaluations reliably is tricky: Recently, using LLMs as evaluators has emerged as a promising alternative to human evaluations and has proven quite effective in improving the accuracy of LLM applications. However, difficulties still exist when running these evals reliably, i.e. high correlation with human judgments and stability across multiple runs. UpTrain is an open-source framework for evaluating LLM applications that provide high-quality scores. It allows one to define custom evaluations via GuidelineAdherence check, where one can determine any custom guideline in plain English and check if the LLM follows it. Additionally, it provides an easy interface to run these evaluations on production responses with a single API call. This allows one to systematically leverage frameworks like UpTrain to check for wrong LLM outputs.
I am one of the maintainers of UpTrain, and we recently integrated the SPADE framework into our open-source repo (https://github.com/uptrain-ai/uptrain/). The idea is simple:
-
Sharing learnings from evaluating Million+ LLM responses
b. Task Dependent: Tonality match with the given persona, creativity, interestingness, etc. Your prompt can play a big role here
3. Evaluating Reasoning Capabilities: Includes dimensions like logical correctness (right conclusions), logical robustness (consistent with minor input changes), logical efficiency (shortest solution path), and common sense understanding (grasping common concepts). One canβt do much beyond prompting techniques like CoT and primarily depends upon the LLM chosen.
4. Custom Evaluations: Many applications require customized metrics tailored to their specific needs. You want adherence to custom guidelines, check for certain keywords, etc.
You can read the full blog here (https://uptrain.ai/blog/how-to-evaluate-your-llm-applications). Hope you find it useful. I am one of the developer of UpTrain - it is an open-source package to evaluate LLM applications (https://github.com/uptrain-ai/uptrain).
Would love to get feedback from the HN community.
- Show HN: UpTrain (YC W23) β open-source tool to evaluate LLM response quality
-
Introducing UpTrain - Open-source LLM evaluator π
Open-source repo: https://github.com/uptrain-ai/uptrain
-
Launching UpTrain - an open-source LLM testing tool to help check the performance of your LLM applications
You can check out the project - https://github.com/uptrain-ai/uptrain and would love to hear feedback from the community
- [P] A Practical Guide to Enhancing Models for Custom Use-cases
-
[D] Any options for using GPT models using proprietary data ?
I am building an open source project which helps in collecting the high quality retraining dataset for fine-tuning LLMs. Check out https://github.com/uptrain-ai/uptrain
-
[D] Should we draw inspiration from Deep learning/Computer vision world for fine-tuning LLMs?
P.S. I am building an open-source project UpTrain (https://github.com/uptrain-ai/uptrain), which helps data scientists to do so. We just wrote a blog on how this principle can be applied to fine-tune an LLM for a conversation summarization task. Check it out here: https://github.com/uptrain-ai/uptrain/tree/main/examples/coversation_summarization
- Show HN: UpTrain β A Practical Approach to Finetuning LLMs for Custom Use-Cases
-
A note from our sponsor - InfluxDB
www.influxdata.com | 1 May 2024
Stats
uptrain-ai/uptrain is an open source project licensed under Apache License 2.0 which is an OSI approved license.
The primary programming language of uptrain is Python.
Sponsored