Python rlhf

Open-source Python projects categorized as rlhf

Top 15 Python rlhf Projects

  • Open-Assistant

    OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

  • Project mention: Best open source AI chatbot alternative? | /r/opensource | 2023-12-08

    For open assistant, the code: https://github.com/LAION-AI/Open-Assistant/tree/main/inference

  • LLaMA-Factory

    Unify Efficient Fine-Tuning of 100+ LLMs

  • Project mention: Show HN: GPU Prices on eBay | news.ycombinator.com | 2024-02-23

    Depends what model you want to train, and how well you want your computer to keep working while you're doing it.

    If you're interested in large language models there's a table of vram requirements for fine-tuning at [1] which says you could do the most basic type of fine-tuning on a 7B parameter model with 8GB VRAM.

    You'll find that training takes quite a long time, and as a lot of the GPU power is going on training, your computer's responsiveness will suffer - even basic things like scrolling in your web browser or changing tabs uses the GPU, after all.

    Spend a bit more and you'll probably have a better time.

    [1] https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#...

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • LLMSurvey

    The official GitHub page for the survey paper "A Survey of Large Language Models".

  • Project mention: Ask HN: Textbook Regarding LLMs | news.ycombinator.com | 2024-03-23

    Here’s another one - it’s older but has some interesting charts and graphs.

    https://arxiv.org/abs/2303.18223

  • alignment-handbook

    Robust recipes to align language models with human and AI preferences

  • Project mention: Recipes to align LLMs with AI feedback | news.ycombinator.com | 2024-03-03
  • argilla

    Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.

  • Project mention: Open-Source Data Collection Platform for LLM Fine-Tuning and RLHF | news.ycombinator.com | 2023-06-05

    I'm Dani, CEO and co-founder of Argilla.

    Happy to answer any questions you might have and excited to hear your thoughts!

    More about Argilla

    GitHub: https://github.com/argilla-io/argilla

  • WebGLM

    WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)

  • Project mention: WebGLM: Web-Enhanced Q&A with LLMs | news.ycombinator.com | 2023-06-22
  • safe-rlhf

    Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

  • Project mention: [R] Meet Beaver-7B: a Constrained Value-Aligned LLM via Safe RLHF Technique | /r/MachineLearning | 2023-05-16
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • ImageReward

    [NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation

  • Project mention: Results of finetuning Avalon TRUvision v2 with image scoring | /r/StableDiffusion | 2023-05-17

    I used Image Reward repo to score generated imaged during training and modified loss function to take score into account.

  • distilabel

    ⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.

  • Project mention: Open-source AI Feedback framework for scalable LLM Alignment | news.ycombinator.com | 2023-11-23
  • HALOs

    A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

  • Project mention: On Sleeper Agent LLMs | news.ycombinator.com | 2024-01-13

    If you are using no-code solutions, increasing an "idea" in a dataset will make that idea more likely to appear.

    If you are fine-tuning your own LLM, there are other ways to get your idea to appear. In the literature this is sometimes called RLHF or preference optimization, and here are a few approaches:

    Direct Preference Optimization

    This uses Elo-scores to learn pairwise preferences. Elo is used in chess and basketball to rank individuals who compete in pairs.

    @argilla_io on X.com has been doing some work in evaluating DPO.

    Here is a decent thread on this: https://x.com/argilla_io/status/1745057571696693689?s=20

    Identity Preference Optimization

    IPO is research from Google DeepMind. It removes the reliance of Elo scores to address overfitting issues in DPO.

    Paper: https://x.com/kylemarieb/status/1728281581306233036?s=20

    Kahneman-Tversky Optimization

    KTO is an approach that uses mono preference data. For example, it asks if a response is "good or not." This is helpful for a lot of real word situations (e.g. "Is the restaurant well liked?").

    Here is a brief discussion on it:

    https://x.com/ralphbrooks/status/1744840033872330938?s=20

    Here is more on KTO:

    * Paper: https://github.com/ContextualAI/HALOs/blob/main/assets/repor...

    * Code: https://github.com/ContextualAI/HALOs

  • Cornucopia-LLaMA-Fin-Chinese

    聚宝盆(Cornucopia): 中文金融系列开源可商用大模型,并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)

  • Project mention: Cornucopia-LLaMA-Fin-Chinese: NEW Textual - star count:263.0 | /r/algoprojects | 2023-07-31
  • TextRL

    Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

  • chain-of-hindsight

    Chain-of-Hindsight, A Scalable RLHF Method

  • cogment-verse

    Research platform for Human-in-the-loop learning (HILL) & Multi-Agent Reinforcement Learning (MARL)

  • opening-up-chatgpt.github.io

    Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.

  • Project mention: Tracking Openness of Instruction-Tuned LLMs | news.ycombinator.com | 2023-12-24
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python rlhf related posts

Index

What are some of the best open-source rlhf projects in Python? This list will help you:

Project Stars
1 Open-Assistant 36,622
2 LLaMA-Factory 17,050
3 LLMSurvey 8,716
4 alignment-handbook 3,744
5 argilla 3,108
6 WebGLM 1,506
7 safe-rlhf 1,149
8 ImageReward 938
9 distilabel 825
10 HALOs 541
11 Cornucopia-LLaMA-Fin-Chinese 521
12 TextRL 519
13 chain-of-hindsight 205
14 cogment-verse 73
15 opening-up-chatgpt.github.io 64

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com