My experience on starting with fine tuning LLMs with custom data

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

llama_index

75 31,184 10.0 Python

LlamaIndex is a data framework for your LLM applications

Thank you, OP. Your examples are truly insightful and align perfectly with what I was hoping to glean from this thread. I've been grappling with the decision of whether to first learn a library like LlamaIndex, or start with fine-tuning LLM.

instructor-embedding

4 1,703 5.9 Python

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

If you li embeddings and vector DB, you should look into this: https://github.com/HKUNLP/instructor-embedding

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Nuggt

10 338 8.0 Python

An Autonomous LLM Agent that runs on Wizcoder-15B

Yes, there is a lot of potential. You can check this project for agents: https://github.com/Nuggt-dev/Nuggt/ . Currently I only have "simple" projects: mostly 0-shots LLMs to get some responses. Agents are not yet mature enough to be integrated in production environments.

semantra

17 2,271 6.6 Python

Multi-tool for semantic search

Yep, same. This works decently well: https://github.com/freedmand/semantra

h2ogpt

28 10,458 10.0 Python

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://codellama.h2o.ai/

I'm also working on the finetuning of models for Q&A and I've finetuned llama-7b, falcon-40b, and oasst-pythia-12b using HuggingFace's SFT, H2OGPT's finetuning script and lit-gpt.

lit-gpt

4 5,243 9.6 Python

Discontinued Hackable implementation of state-of-the-art open-source LLMs based on nanoGPT. Supports flash attention, 4-bit and 8-bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed. [Moved to: https://github.com/Lightning-AI/litgpt]

I'm also working on the finetuning of models for Q&A and I've finetuned llama-7b, falcon-40b, and oasst-pythia-12b using HuggingFace's SFT, H2OGPT's finetuning script and lit-gpt.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

[D] github repositories for ai web search agents

2 projects | /r/MachineLearning | 9 Dec 2023
Run and create custom ChatGPT-like bots with OpenChat

15 projects | news.ycombinator.com | 7 Jun 2023
I've made a customisable SMS personal assistant which has infinite and persistent semantic memory.

2 projects | /r/LocalLLaMA | 27 May 2023
Claude AI launches on iOS (Android coming soon)

2 projects | news.ycombinator.com | 1 May 2024
AgentCloud vs Google Cloud Agents

1 project | dev.to | 29 Apr 2024

My experience on starting with fine tuning LLMs with custom data

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA
Embeddings chatgpt information-retrieval llm language-model
Post date: 10 Jul 2023

llama_index

instructor-embedding

InfluxDB

Nuggt

semantra

h2ogpt

lit-gpt

Related posts

[D] github repositories for ai web search agents

Run and create custom ChatGPT-like bots with OpenChat

I've made a customisable SMS personal assistant which has infinite and persistent semantic memory.

Claude AI launches on iOS (Android coming soon)

AgentCloud vs Google Cloud Agents

My experience on starting with fine tuning LLMs with custom data

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA Embeddings chatgpt information-retrieval llm language-model Post date: 10 Jul 2023

llama_index

instructor-embedding

InfluxDB

Nuggt

semantra

h2ogpt

lit-gpt

Related posts

[D] github repositories for ai web search agents

Run and create custom ChatGPT-like bots with OpenChat

I've made a customisable SMS personal assistant which has infinite and persistent semantic memory.

Claude AI launches on iOS (Android coming soon)

AgentCloud vs Google Cloud Agents

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA
Embeddings chatgpt information-retrieval llm language-model
Post date: 10 Jul 2023