Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
marsha
Marsha is a functional, higher-level, English-based programming language that gets compiled into tested Python software by an LLM
https://github.com/facebookresearch/llama
Once you get a link to download on email make sure to copy it without spaces, an option is to open it in a new tab and then download. If you are using fish or another fancy shell, make sure you switch to bash or sh before running download.sh from the repo.
I am not sure exactly how much space is needed but it is likely north of 500GB given that there are two 70B models (you are given the option to download just the small ones in a prompt).
You can run inference for LLaMA 7B with 8GB of ram and a CPU: https://github.com/ggerganov/llama.cpp
The major limitation for email classification would be the 2048 token limit though.
* or even deploy your own LLaMA v2 fine tune with Cog (https://github.com/a16z-infra/cog-llama-template)
Please let us know what you use this for or if you have feedback! And thanks to all contributors to this model, Meta, Replicate, the Open Source community!
It depends -- do you mean as a general end-user of a chat platform or do you mean to include a model as part of an app or service?
As an end user, what I've found works in practice is to use one of the models until it gives me an answer I'm unhappy with. At that point I'll try another model and see whether the response is better. Do this for long enough and you'll get a sense of the various models' strengths and weaknesses (although the tl;dr is that if you're willing to pay GPT-4 is better than anything else across most use cases right now).
For evaluating models for app integrations, I can plug an open source combined playground + eval harness I'm currently developing: https://github.com/openpipe/openpipe
We're working on integrating Llama 2 so users can test it against other models for their own workloads head to head. (We're also working on a hosted SaaS version so people don't have to download/install Postgres and Node!)
https://github.com/replicate/cog
Our thinking was just that a bunch of folks will want to fine-tune right away, then deploy the fine-tunes, so trying to make that easy... Or even just deploy the models-as-is on their own infra without dealing with CUDA insanity!
If you want to try running Llama 2 locally, you can use https://github.com/jmorganca/ollama
To run Llama 2 with it:
```
I think using this project https://github.com/ggerganov/llama.cppav
on a CPU machine with AVX instructions would be a better bang for your buck than GPU. Depends on if your use case can tolerate the latency
Version that runs on the CPU: https://github.com/krychu/llama
I get 1 word per ~1.5 secs on a Mac Book Pro M1.
So this comment inspired me to write a Roman Numeral to Integer function in out LLM-based programming language, Marsha: https://github.com/alantech/marsha/blob/main/examples/genera...