Python llm-training

Open-source Python projects categorized as llm-training

Top 6 Python llm-training Projects

  • ludwig

    Low-code framework for building custom LLMs, neural networks, and other AI models

  • Project mention: Show HN: Toolkit for LLM Fine-Tuning, Ablating and Testing | news.ycombinator.com | 2024-04-07

    This is a great project, little bit similar to https://github.com/ludwig-ai/ludwig, but it includes testing capabilities and ablation.

    questions regarding the LLM testing aspect: How extensive is the test coverage for LLM use cases, and what is the current state of this project area? Do you offer any guarantees, or is it considered an open-ended problem?

    Would love to see more progress toward this area!

  • skypilot

    SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.

  • Project mention: Ask HN: Most efficient way to fine-tune an LLM in 2024? | news.ycombinator.com | 2024-04-04
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • dbrx

    Code examples and resources for DBRX, a large language model developed by Databricks

  • Project mention: Hello OLMo: A Open LLM | news.ycombinator.com | 2024-04-08

    One thing I wanted to add and call attention to is the importance of licensing in open models. This is often overlooked when we blindly accept the vague branding of models as “open”, but I am noticing that many open weight models are actually using encumbered proprietary licenses rather than standard open source licenses that are OSI approved (https://opensource.org/licenses). As an example, Databricks’s DBRX model has a proprietary license that forces adherence to their highly restrictive Acceptable Use Policy by referencing a live website hosting their AUP (https://github.com/databricks/dbrx/blob/main/LICENSE), which means as they change their AUP, you may be further restricted in the future. Meta’s Llama is similar (https://github.com/meta-llama/llama/blob/main/LICENSE ). I’m not sure who can depend on these models given this flaw.

  • Finetune_LLMs

    Repo for fine-tuning Casual LLMs

  • discus

    A data-centric AI package for ML/AI. Get the best high-quality data for the best results. Discord: https://discord.gg/t6ADqBKrdZ

  • Project mention: an open source package helping developers generate data for LLMs | /r/mlops | 2023-08-02
  • Auto-Data

    Auto Data is a library designed for quick and effortless creation of datasets tailored for fine-tuning Large Language Models (LLMs).

  • Project mention: Show HN: Fine-Tuning Data Generator Written Purely in Python | news.ycombinator.com | 2024-04-10
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Index

What are some of the best open-source llm-training projects in Python? This list will help you:

Project Stars
1 ludwig 10,801
2 skypilot 5,636
3 dbrx 2,363
4 Finetune_LLMs 438
5 discus 62
6 Auto-Data 48

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com