Run LLMs at home, BitTorrent‑style

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • petals

    🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

  • The first question I had was "what are the economics?"

    > Will Petals incentives be based on crypto, blockchain, etc.?

    No, we are working on a centralized incentive system similar to the AI Horde kudos, even though Petals is a fully decentralized system in all other aspects. We do not plan to provide a service to exchange these points for money, so you should see these incentives as "game" points designed to be spent inside our system.

    Petals is an ML-focused project designed for ML researchers and engineers, it does not have anything to do with finance. We decided to make the incentive system centralized because it is much easier to develop and maintain, so we can focus on developing features useful for ML researchers.

    https://github.com/bigscience-workshop/petals/wiki/FAQ:-Freq...

  • artbot-for-stable-diffusion

    A front-end GUI for interacting with the AI Horde / Stable Diffusion distributed cluster

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • ggml

    Tensor library for machine learning (by philpax)

  • https://github.com/philpax/ggml/blob/gguf-spec/docs/gguf.md#...

    It is (IMO) a necessary and good change.

    I just specified gguf because my 3090 cannot host a 70B model without offloading outside of exLlama's very new ~2 bit quantization.

  • ollama

    Get up and running with Llama 3, Mistral, Gemma, and other large language models.

  • This is neat. Model weights are split into their layers and distributed across several machines who then report themselves in a big hash table when they are ready to perform inference or fine tuning "as a team" over their subset of the layers.

    It's early but I've been working on hosting model weights in a Docker registry for https://github.com/jmorganca/ollama. Mainly for the content addressability (Ollama will verify the correct weights are downloaded every time) and ultimately weights can be fetched by their content instead of by their name or url (which may change!). Perhaps a good next step might be to split the models by layers for use cases like this (or even just for downloading + running larger models over several "local" machines).

  • chat.petals.dev

    💬 Chatbot web app + HTTP and Websocket endpoints for LLM inference with the Petals client

  • Hi, a dev here. means "end of sequence" for LLMs. If a model generates it, it forgets everything and continue with an unrelated random text. So I don't think that malicious actors are involved here.

    Apparently, the Colab code snippet is just too simplified and does not handle correctly. This is not the case with the full chatbot app at https://chat.petals.dev - you can use it instead or take a look at its code.

  • axolotl

    Go ahead and axolotl questions

  • 80GB is enough, yeah.

    I'm not sure what exact LORA/quantization settings would be ideal, but check out https://github.com/OpenAccess-AI-Collective/axolotl#config

  • LoRA

    Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

  • Somewhat yes. See "LoRA": https://arxiv.org/abs/2106.09685

    They're not composable in the sense that you can take these adaptation layers and arbitrarily combine them, but training different models while sharing a common base of weights is a solved problem.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts