Alpaca- An Instruct Tuned Llama 7B. Responses on par with txt-DaVinci-3. Demo up

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

  • Might I suggest looking the story between the 2nd and 10th of march? I've noticed Hacker News hasn't been following certain areas of the effort. A lot of great work had happened and continues to be happen in close conjunction with the text-generation-webui (seriously, most of the cutting edge with 4-bit GPTQ etc. has been closely tied to the project).

    >https://github.com/oobabooga/text-generation-webui/

  • stanford_alpaca

    Code and documentation to train Stanford's Alpaca models, and generate the data.

  • Here's a link that opens their training data (52,000 rows) in Datasette Lite: https://lite.datasette.io/?json=https://github.com/tatsu-lab...

    Means you can run SQL LIKE queries against it to try and get a feel for what's in there.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • llama

    Inference code for Llama models

  • This is why I think we're seeing a Stable Diffusion moment for LLMs: https://simonwillison.net/2023/Mar/11/llama/

    Look at the timeline:

    24th February 2023: LLaMA is announced, starts being shared with academic partners: https://research.facebook.com/publications/llama-open-and-ef...

    2nd March: Someone posts a PR with a BitTorrent link to the models: https://github.com/facebookresearch/llama/pull/73

    10th March: First commit to llama.cpp by Georgi Gerganov: https://github.com/ggerganov/llama.cpp/commit/26c084662903dd...

    11th March: llama.cpp now runs the 7B model on a 4GB RaspberryPi: https://twitter.com/miolini/status/1634982361757790209

    13th March (today): llama.cpp on a Pixel 6 phone: https://twitter.com/thiteanish/status/1635188333705043969

    And now, Alpaca. It's not even lunchtime yet!

  • llama.cpp

    LLM inference in C/C++

  • This is why I think we're seeing a Stable Diffusion moment for LLMs: https://simonwillison.net/2023/Mar/11/llama/

    Look at the timeline:

    24th February 2023: LLaMA is announced, starts being shared with academic partners: https://research.facebook.com/publications/llama-open-and-ef...

    2nd March: Someone posts a PR with a BitTorrent link to the models: https://github.com/facebookresearch/llama/pull/73

    10th March: First commit to llama.cpp by Georgi Gerganov: https://github.com/ggerganov/llama.cpp/commit/26c084662903dd...

    11th March: llama.cpp now runs the 7B model on a 4GB RaspberryPi: https://twitter.com/miolini/status/1634982361757790209

    13th March (today): llama.cpp on a Pixel 6 phone: https://twitter.com/thiteanish/status/1635188333705043969

    And now, Alpaca. It's not even lunchtime yet!

  • llama

    Inference code for LLaMA models (by gmorenz)

  • > All the magic of "7B LLaMA running on a potato" seems to involve lowering precision down to f16 and then further quantizing to int4.

    LLaMa weights are f16s to start out with, no lowering necessary to get to there.

    You can stream weights from RAM to the GPU pretty efficiently. If you have >= 32GB ram and >=2GB vram my code here should work for you: https://github.com/gmorenz/llama/tree/gpu_offload

    There's probably a cleaner version of it somewhere else. Really you should only need >= 16 GB ram, but the (meta provided) code to load the initial weights is completely unnecessarily making two copies of the weights in RAM simultaneously.

  • Open-Assistant

    OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

  • Such a thing already exists and there were some results - https://open-assistant.io

    I'm not sure why the authors of Alpaca didn't try to train it on this dataset.

  • self-instruct

    Aligning pretrained language models with instruction data generated by themselves.

  • It says

    > We train the Alpaca model on 52K instruction-following demonstrations generated in the style of self-instruct using text-davinci-003

    Which leads to self-instruct https://github.com/yizhongw/self-instruct

    From a glimpse they used a LM to classify instructions & train the model which IMHO very similar to RLHF

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • ChatGLM-6B

    ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

  • Also today: ChatGLM released by Tsinghua University. I've made a separate submission for it: https://news.ycombinator.com/item?id=35150190

    The GitHub page is https://github.com/THUDM/ChatGLM-6B. It's all in Chinese, but the model can handle English queries on a single consumer GPU well. Considering its size, I'd say the quality of its responses are outstanding.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts