[D] The best way to train an LLM on company data

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

sketch

20 2,194 4.4 Python

AI code-writing assistant that understands data content

Please look at sketch and langchain pandas/SQL plugins. I have seen excellent results with both of these approaches. Both of these approaches will require you to send metadata to openAI.

llama-peft-tuner

1 21 4.7 Python

Tune LLaMa-7B on Alpaca Dataset using PEFT / LORA Based on @zphang's https://github.com/zphang/minimal-llama scripts.
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
simple-llm-finetuner

12 1,977 10.0 Jupyter Notebook

Discontinued Simple UI for LLM Model Finetuning

So as far as set up goes, you just need to: “”” Git clone https://github.com/lxe/simple-llama-finetuner Cd simple-llama-finetuner Pip install -r requirements.txt Python app.py ## if you’re on a remote machine (Paperspace is my go to) then you may need to edit the last line of this script to set ‘share=True’ in the launch args “””

sidekick

23 1,101 9.8 TypeScript

Discontinued Universal APIs for unstructured data. Sync documents from SaaS tools to a SQL or vector database, where they can be easily queried by AI applications [Moved to: https://github.com/psychic-api/psychic] (by ai-sidekick)

A project I’m working on helps with ETL for retrieval augmented generation: https://github.com/ai-sidekick/sidekick

azure-search-openai-demo

11 5,286 9.5 Python

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.

What some people have done is to use Azure Cognitive Search as a pre-cursor to the LLM.

lora

83 6,597 0.0 Jupyter Notebook

Using Low-rank adaptation to quickly fine-tune diffusion models. (by cloneofsimo)

It's really not helpful to make strong assertions like this without referring to specific, verifiable sources. Fine-tuning very typically is done in a way where certain layers/parameters of the model are frozen. This is done to avoid the sort of loss we are discussing. The LoRA paper itself states that LoRA "freezes the pre-trained model weights".

xTuring

31 2,523 8.4 Python

Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6

I'm currently working on an open-source project for building and controlling LLMs: https://github.com/stochasticai/xturing

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project