Our great sponsors
-
petals
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
-
artbot-for-stable-diffusion
A front-end GUI for interacting with the AI Horde / Stable Diffusion distributed cluster
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
chat.petals.dev
💬 Chatbot web app + HTTP and Websocket endpoints for LLM inference with the Petals client
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
The first question I had was "what are the economics?"
> Will Petals incentives be based on crypto, blockchain, etc.?
No, we are working on a centralized incentive system similar to the AI Horde kudos, even though Petals is a fully decentralized system in all other aspects. We do not plan to provide a service to exchange these points for money, so you should see these incentives as "game" points designed to be spent inside our system.
Petals is an ML-focused project designed for ML researchers and engineers, it does not have anything to do with finance. We decided to make the incentive system centralized because it is much easier to develop and maintain, so we can focus on developing features useful for ML researchers.
https://github.com/bigscience-workshop/petals/wiki/FAQ:-Freq...
https://github.com/philpax/ggml/blob/gguf-spec/docs/gguf.md#...
It is (IMO) a necessary and good change.
I just specified gguf because my 3090 cannot host a 70B model without offloading outside of exLlama's very new ~2 bit quantization.
This is neat. Model weights are split into their layers and distributed across several machines who then report themselves in a big hash table when they are ready to perform inference or fine tuning "as a team" over their subset of the layers.
It's early but I've been working on hosting model weights in a Docker registry for https://github.com/jmorganca/ollama. Mainly for the content addressability (Ollama will verify the correct weights are downloaded every time) and ultimately weights can be fetched by their content instead of by their name or url (which may change!). Perhaps a good next step might be to split the models by layers for use cases like this (or even just for downloading + running larger models over several "local" machines).
Hi, a dev here. means "end of sequence" for LLMs. If a model generates it, it forgets everything and continue with an unrelated random text. So I don't think that malicious actors are involved here.
Apparently, the Colab code snippet is just too simplified and does not handle correctly. This is not the case with the full chatbot app at https://chat.petals.dev - you can use it instead or take a look at its code.
80GB is enough, yeah.
I'm not sure what exact LORA/quantization settings would be ideal, but check out https://github.com/OpenAccess-AI-Collective/axolotl#config
Somewhat yes. See "LoRA": https://arxiv.org/abs/2106.09685
They're not composable in the sense that you can take these adaptation layers and arbitrarily combine them, but training different models while sharing a common base of weights is a solved problem.
Related posts
- DECT NR+: A technical dive into non-cellular 5G
- Training LLMs Taking Too Much Time? Technique you need to know to train it faster
- OpenAI employee: GPT-4.5 rumor was a hallucination
- Can a LoRa be used on models other than Stable Diffusion?
- Andreessen Horowitz Invests in Civitai, Which Profits from Nonconsensual AI Porn