Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 15 Python rlhf Projects
-
Open-Assistant
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
argilla
Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.
-
safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
ImageReward
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
-
distilabel
⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.
-
HALOs
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
-
Cornucopia-LLaMA-Fin-Chinese
聚宝盆(Cornucopia): 中文金融系列开源可商用大模型,并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)
-
TextRL
Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
-
cogment-verse
Research platform for Human-in-the-loop learning (HILL) & Multi-Agent Reinforcement Learning (MARL)
-
opening-up-chatgpt.github.io
Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
For open assistant, the code: https://github.com/LAION-AI/Open-Assistant/tree/main/inference
Depends what model you want to train, and how well you want your computer to keep working while you're doing it.
If you're interested in large language models there's a table of vram requirements for fine-tuning at [1] which says you could do the most basic type of fine-tuning on a 7B parameter model with 8GB VRAM.
You'll find that training takes quite a long time, and as a lot of the GPU power is going on training, your computer's responsiveness will suffer - even basic things like scrolling in your web browser or changing tabs uses the GPU, after all.
Spend a bit more and you'll probably have a better time.
[1] https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#...
Here’s another one - it’s older but has some interesting charts and graphs.
https://arxiv.org/abs/2303.18223
Project mention: Open-Source Data Collection Platform for LLM Fine-Tuning and RLHF | news.ycombinator.com | 2023-06-05I'm Dani, CEO and co-founder of Argilla.
Happy to answer any questions you might have and excited to hear your thoughts!
More about Argilla
GitHub: https://github.com/argilla-io/argilla
Project mention: [R] Meet Beaver-7B: a Constrained Value-Aligned LLM via Safe RLHF Technique | /r/MachineLearning | 2023-05-16
Project mention: Results of finetuning Avalon TRUvision v2 with image scoring | /r/StableDiffusion | 2023-05-17I used Image Reward repo to score generated imaged during training and modified loss function to take score into account.
Project mention: Open-source AI Feedback framework for scalable LLM Alignment | news.ycombinator.com | 2023-11-23
If you are using no-code solutions, increasing an "idea" in a dataset will make that idea more likely to appear.
If you are fine-tuning your own LLM, there are other ways to get your idea to appear. In the literature this is sometimes called RLHF or preference optimization, and here are a few approaches:
Direct Preference Optimization
This uses Elo-scores to learn pairwise preferences. Elo is used in chess and basketball to rank individuals who compete in pairs.
@argilla_io on X.com has been doing some work in evaluating DPO.
Here is a decent thread on this: https://x.com/argilla_io/status/1745057571696693689?s=20
Identity Preference Optimization
IPO is research from Google DeepMind. It removes the reliance of Elo scores to address overfitting issues in DPO.
Paper: https://x.com/kylemarieb/status/1728281581306233036?s=20
Kahneman-Tversky Optimization
KTO is an approach that uses mono preference data. For example, it asks if a response is "good or not." This is helpful for a lot of real word situations (e.g. "Is the restaurant well liked?").
Here is a brief discussion on it:
https://x.com/ralphbrooks/status/1744840033872330938?s=20
Here is more on KTO:
* Paper: https://github.com/ContextualAI/HALOs/blob/main/assets/repor...
* Code: https://github.com/ContextualAI/HALOs
Project mention: Cornucopia-LLaMA-Fin-Chinese: NEW Textual - star count:263.0 | /r/algoprojects | 2023-07-31
Python rlhf related posts
- Recipes to align LLMs with AI feedback
- Tracking Openness of Instruction-Tuned LLMs
- What on-demand GPU service would you recommend to do fine-tuning of 7B models ?
- Opening up ChatGPT: tracking “open source” LLM and RLHF architectures
- Cornucopia-LLaMA-Fin-Chinese: NEW Textual - star count:263.0
- Llama and ChatGPT Are Not Open-Source
- Cornucopia-LLaMA-Fin-Chinese: NEW Textual - star count:221.0
-
A note from our sponsor - InfluxDB
www.influxdata.com | 28 Apr 2024
Index
What are some of the best open-source rlhf projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | Open-Assistant | 36,622 |
2 | LLaMA-Factory | 17,050 |
3 | LLMSurvey | 8,716 |
4 | alignment-handbook | 3,744 |
5 | argilla | 3,108 |
6 | WebGLM | 1,506 |
7 | safe-rlhf | 1,149 |
8 | ImageReward | 938 |
9 | distilabel | 825 |
10 | HALOs | 541 |
11 | Cornucopia-LLaMA-Fin-Chinese | 521 |
12 | TextRL | 519 |
13 | chain-of-hindsight | 205 |
14 | cogment-verse | 73 |
15 | opening-up-chatgpt.github.io | 64 |
Sponsored