Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
We're looking for feedback on the project, and we'd love to hear from you! If you're interested in contributing, please reach out to us on our Discord, or post an issue on our GitHub.
The current direction most people are taking is to "fine-tune" an existing base model (something like StableLM or Dolly) using a technique called LoRA. Ref: https://github.com/tloen/alpaca-lora People are also looking into fine-tuning directly on the GGML format. Ref: https://github.com/ggerganov/ggml/issues/8
At present, we are powered by ggml (similar to llama.cpp), but we intend to add additional backends in the near-future. This means that we currently only support CPU inference, but we have several ideas in mind for how to add GPU support, as well as other accelerators.
The current direction most people are taking is to "fine-tune" an existing base model (something like StableLM or Dolly) using a technique called LoRA. Ref: https://github.com/tloen/alpaca-lora People are also looking into fine-tuning directly on the GGML format. Ref: https://github.com/ggerganov/ggml/issues/8
You could try looking at the min-GPT example of tch-rs. I'd also strongly suggest watching Karpathy's video to understand what's going on.