Python llm-evaluation

Open-source Python projects categorized as llm-evaluation

Top 3 Python llm-evaluation Projects

  • continuous-eval

    Open-Source Evaluation for GenAI Application Pipelines

  • Project mention: Show HN: Ellipsis – Automated PR reviews and bug fixes | news.ycombinator.com | 2024-05-09

    Hi HN, hunterbrooks and nbrad here from Ellipsis (https://www.ellipsis.dev). Ellipsis automatically reviews your PRs when opened and on each new commit. If you tag @ellipsis-dev in a comment, it can make changes to the PR (via direct commit or side PR) and answer questions, just like a human.

    Demo video: https://www.youtube.com/watch?v=X61NGZpaNQA

    So far, we have dozens of open source projects and companies using Ellipsis. We seem to have landed in a kind of sweet spot where there’s a good match between the current capabilities of AI tools and the actual needs of software engineers - this doesn’t replace human review, but it saves you time by catching/fixing lots of small silly stuff.

    Here’s an example in the wild: https://github.com/relari-ai/continuous-eval/pull/38, where Ellipsis (1) adds a PR summary; (2) finds a bug and adds a review comment; (3) after a [human] user comments, generates a side PR with the fix; and (4) after a (human) user merges the side PR and adds another commit, re-reviews the PR and approves it

    Here’s another example: https://github.com/SciPhi-AI/R2R/pull/350#pullrequestreview-..., where Ellipsis adds several comments with inline suggestions that were directly merged by the developer.

    You can configure Ellipsis in natural language to enforce custom rules, style guides, or conventions. For example, here’s how the `jxnl/instructor` repo uses natural language rules to make sure that docs are kept in sync: https://github.com/jxnl/instructor/blob/main/ellipsis.yaml#L..., and here’s an example PR that Ellipsis came up with based on those rules: https://github.com/jxnl/instructor/pull/346.

    Don’t worry, your code is never stored or used to train models (https://docs.ellipsis.dev/security).

    Installing into your repo takes 2 clicks at https://www.ellipsis.dev. We’d really appreciate your feedback, thoughts, and ideas!

  • superpipe

    Superpipe - optimized LLM pipelines for structured data

  • Project mention: Show HN: Superpipe – optimized LLM pipelines for structured outputs | news.ycombinator.com | 2024-03-26
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • CommonGen-Eval

    Evaluating LLMs with CommonGen-Lite

  • Project mention: Evaluating LLMs with CommonGen-Lite | news.ycombinator.com | 2024-01-08

    Leaderboard: https://github.com/allenai/CommonGen-Eval?tab=readme-ov-file...

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python llm-evaluation related posts

  • Launch HN: Relari (YC W24) – Identify the root cause of problems in LLM apps

    1 project | news.ycombinator.com | 8 Mar 2024

Index

What are some of the best open-source llm-evaluation projects in Python? This list will help you:

Project Stars
1 continuous-eval 328
2 superpipe 98
3 CommonGen-Eval 80

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com