Training LLaMA-65B with Stanford Code

This page summarizes the projects mentioned and recommended in the original post on /r/Oobabooga

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

    By the way https://github.com/huggingface/transformers/pull/21955 is being merged into transformers :)

  • stanford_alpaca

    Code and documentation to train Stanford's Alpaca models, and generate the data.

    vast.ai for lower prices. I don't know about your other queries, look at https://huggingface.co/chavinlo/alpaca-native for details, as well as https://github.com/tatsu-lab/stanford_alpaca/issues/32

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • FlexGen

    Discontinued Running large language models like OPT-175B/GPT-3 on a single GPU. Focusing on high-throughput generation. [Moved to: https://github.com/FMInference/FlexGen] (by Ying1123)

    #1: Progress Update | 4 comments #2: the default UI on the pinned Google Colab is buggy so I made my own frontend - YAFFOA. | 18 comments #3: Paper reduces resource requirement of a 175B model down to 16GB GPU | 19 comments

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts