Our great sponsors
-
text-generation-webui
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
-
empirical-philosophy
A collection of empirical experiments using large language models and other neural network architectures to test the usefulness of metaphysical constructs.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
serge
A web interface for chatting with Alpaca through llama.cpp. Fully dockerized, with an easy to use API.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
This really ought to mention https://github.com/oobabooga/text-generation-webui, which was the first popular UI for LLaMA, and remains one for anyone who runs it on GPU. It is also where GPTQ 4-bit quantization was first enabled in a LLaMA-based chatbot; llama.cpp picked it up later.
One way that I've been framing this in my head (and in an application I'm building) is that gpt-3 will be useful for analytic tasks where as gpt-4 will be required for synthetic tasks. I'm using "analytic" and "synthetic" in the same way as in this writeup https://github.com/williamcotton/empirical-philosophy/blob/m...
Do you need additional detail that cannot be found here?
https://github.com/AUTOMATIC1111/stable-diffusion-webui
Or are you looking for the cutting edge stuff like control net?
If you want to use colab instead, I used this a month or two ago.
https://colab.research.google.com/github/TheLastBen/fast-sta...
I hope other people can give you further reading.
>Until I can spin up a docker image capable of the same as OpenAI in hetzner for 30 bucks a month
I do exactly this with https://github.com/nsarrazin/serge
Since it's CPU based it's slower than OpenAI, but still usable. Since Hetzner will famously install any hardware you send them for $100. So you can send them a $200 P40 24GB to run 30B GPU models at ChatGPT speeds without increasing your monthly cost.
If you're technical just get yourself OpenAI API access which is super cheap and hook it up to your own self-hosted ChatGPT clone like https://github.com/magma-labs/magma-chat
The wait for GPT-4 is not as long as it used to be, and when you're using the API directly there's no censorship.
use https://github.com/deep-floyd/IF, it uses LLM to generate exact art you need.
There's a pull request in the official LLaMA repo that adds Magnet links for all the models to the README. Until these were uploaded to HuggingFace, this PR was the primary source for most people downloading the model.
https://github.com/facebookresearch/llama/pull/73/files
Two months later, Facebook hasn't merged the change, but they also haven't deleted it or tried to censor it in any way. I find that hard to explain unless the leak really was intentional; with pretty much any large company, this kind of thing would normally get killed on sight.
About 15GB training it in the webui.
If you use https://github.com/johnsmith0031/alpaca_lora_4bit then 30B only needs 24GB, and works on a single 3090 or $200 P40.