Run LLMs at home, BitTorrent‑style

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

petals

98 8,661 8.5 Python

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

The first question I had was "what are the economics?"
> Will Petals incentives be based on crypto, blockchain, etc.?
No, we are working on a centralized incentive system similar to the AI Horde kudos, even though Petals is a fully decentralized system in all other aspects. We do not plan to provide a service to exchange these points for money, so you should see these incentives as "game" points designed to be spent inside our system.
Petals is an ML-focused project designed for ML researchers and engineers, it does not have anything to do with finance. We decided to make the incentive system centralized because it is much easier to develop and maintain, so we can focus on developing features useful for ML researchers.
https://github.com/bigscience-workshop/petals/wiki/FAQ:-Freq...

artbot-for-stable-diffusion

85 157 9.4 TypeScript

A front-end GUI for interacting with the AI Horde / Stable Diffusion distributed cluster
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
ggml

3 19 8.6

Tensor library for machine learning (by philpax)

https://github.com/philpax/ggml/blob/gguf-spec/docs/gguf.md#...
It is (IMO) a necessary and good change.
I just specified gguf because my 3090 cannot host a 70B model without offloading outside of exLlama's very new ~2 bit quantization.

ollama

195 58,943 9.9 Go

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

This is neat. Model weights are split into their layers and distributed across several machines who then report themselves in a big hash table when they are ready to perform inference or fine tuning "as a team" over their subset of the layers.
It's early but I've been working on hosting model weights in a Docker registry for https://github.com/jmorganca/ollama. Mainly for the content addressability (Ollama will verify the correct weights are downloaded every time) and ultimately weights can be fetched by their content instead of by their name or url (which may change!). Perhaps a good next step might be to split the models by layers for use cases like this (or even just for downloading + running larger models over several "local" machines).

chat.petals.dev

8 296 7.5 Python

💬 Chatbot web app + HTTP and Websocket endpoints for LLM inference with the Petals client

Hi, a dev here. means "end of sequence" for LLMs. If a model generates it, it forgets everything and continue with an unrelated random text. So I don't think that malicious actors are involved here.
Apparently, the Colab code snippet is just too simplified and does not handle correctly. This is not the case with the full chatbot app at https://chat.petals.dev - you can use it instead or take a look at its code.

axolotl

29 5,811 9.8 Python

Go ahead and axolotl questions

80GB is enough, yeah.
I'm not sure what exact LORA/quantization settings would be ideal, but check out https://github.com/OpenAccess-AI-Collective/axolotl#config

LoRA

34 9,046 5.4 Python

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Somewhat yes. See "LoRA": https://arxiv.org/abs/2106.09685
They're not composable in the sense that you can take these adaptation layers and arbitrarily combine them, but training different models while sharing a common base of weights is a solved problem.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project