Top 3 Python llm-evaluation Projects

continuous-eval

4 328 8.7 Python

Open-Source Evaluation for GenAI Application Pipelines

Project mention: Show HN: Ellipsis – Automated PR reviews and bug fixes | news.ycombinator.com | 2024-05-09

Hi HN, hunterbrooks and nbrad here from Ellipsis (https://www.ellipsis.dev). Ellipsis automatically reviews your PRs when opened and on each new commit. If you tag @ellipsis-dev in a comment, it can make changes to the PR (via direct commit or side PR) and answer questions, just like a human.
Demo video: https://www.youtube.com/watch?v=X61NGZpaNQA
So far, we have dozens of open source projects and companies using Ellipsis. We seem to have landed in a kind of sweet spot where there’s a good match between the current capabilities of AI tools and the actual needs of software engineers - this doesn’t replace human review, but it saves you time by catching/fixing lots of small silly stuff.
Here’s an example in the wild: https://github.com/relari-ai/continuous-eval/pull/38, where Ellipsis (1) adds a PR summary; (2) finds a bug and adds a review comment; (3) after a [human] user comments, generates a side PR with the fix; and (4) after a (human) user merges the side PR and adds another commit, re-reviews the PR and approves it
Here’s another example: https://github.com/SciPhi-AI/R2R/pull/350#pullrequestreview-..., where Ellipsis adds several comments with inline suggestions that were directly merged by the developer.
You can configure Ellipsis in natural language to enforce custom rules, style guides, or conventions. For example, here’s how the `jxnl/instructor` repo uses natural language rules to make sure that docs are kept in sync: https://github.com/jxnl/instructor/blob/main/ellipsis.yaml#L..., and here’s an example PR that Ellipsis came up with based on those rules: https://github.com/jxnl/instructor/pull/346.
Don’t worry, your code is never stored or used to train models (https://docs.ellipsis.dev/security).
Installing into your repo takes 2 clicks at https://www.ellipsis.dev. We’d really appreciate your feedback, thoughts, and ideas!

superpipe

1 98 8.9 Python

Superpipe - optimized LLM pipelines for structured data

Project mention: Show HN: Superpipe – optimized LLM pipelines for structured outputs | news.ycombinator.com | 2024-03-26

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
CommonGen-Eval

2 80 7.8 Python

Evaluating LLMs with CommonGen-Lite

Project mention: Evaluating LLMs with CommonGen-Lite | news.ycombinator.com | 2024-01-08

Leaderboard: https://github.com/allenai/CommonGen-Eval?tab=readme-ov-file...

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python llm-evaluation related posts

Launch HN: Relari (YC W24) – Identify the root cause of problems in LLM apps

1 project | news.ycombinator.com | 8 Mar 2024

Index

What are some of the best open-source llm-evaluation projects in Python? This list will help you:

	Project	Stars
1	continuous-eval	328
2	superpipe	98
3	CommonGen-Eval	80

Python llm-evaluation

Top 3 Python llm-evaluation Projects

continuous-eval

superpipe

InfluxDB

CommonGen-Eval

Python llm-evaluation related posts

Launch HN: Relari (YC W24) – Identify the root cause of problems in LLM apps

Index