LLaMA: A foundational, 65B-parameter large language model

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • llama

    Inference code for Llama models

    > To maintain integrity and prevent misuse, we are releasing our model under a noncommercial license focused on research use cases. Access to the model will be granted on a case-by-case basis to academic researchers; those affiliated with organizations in government, civil society, and academia; and industry research laboratories around the world. People interested in applying for access can find the link to the application in our research paper.

    The closest you are going to get to the source is here: https://github.com/facebookresearch/llama

    It is still unclear if your're even going to get access to the entire model. Even if you did, you can't use it for your commercial product anyway.

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • xformers

    Hackable and optimized Transformers building blocks, supporting a composable construction.

    I'm going to assume you know how to stand up and manage a distributed training cluster as a simplifying assumption. Note this is an aggressive assumption.

    You would need to replicate the preprocessing steps. Replicating these steps is going to be tricky as they are not described in detail.Then you would need to implement the model using xformers [1]. Using xformers is going to save you a lot of compute spend. You will need to manually implement the backwards pass to reduce recomputation of expensive activations.

    The model was trained using 2048 A100 GPUs with 80GBs of VRAM. A single 8 A100 GPU machine from Lambda Cloud costs $12.00/hr [2]. The team from meta used 256 such machines giving you a per day cost of $73,728. It takes 21 days to train this model. The upfront lower bound cost estimate of doing this is [(12.00 * 24) * 21 * 256) = ] $1,548,288 dollars assuming everything goes smoothly and your model doesn't bite it during training. You may be able to negotiate bulk pricing for these types of workloads.

    That dollar value is just for the compute resources alone. Given the compute costs required you will probably also want a team composed of ML Ops engineers to monitor the training cluster and research scientists to help you with the preprocessing and model pipelines.

    [1] https://github.com/facebookresearch/xformers

  • gpt_index

    Discontinued LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data. [Moved to: https://github.com/jerryjliu/llama_index]

    (creator of gpt index / llamaindex here https://github.com/jerryjliu/gpt_index)

    Funny that we had just rebranded our tool from GPT Index to LlamaIndex about a week ago to avoid potential trademark issues with OpenAI, and turns out Meta has similar ideas around LLM+llama puns :). Must mean the name is good though!

    Also very excited to try plugging in the LLaMa model into LlamaIndex, will report the results.

  • Quake-III-Arena

    Quake III Arena GPL Source Release

    You mean this code?

    https://archive.softwareheritage.org/browse/content/sha1_git...

    Do you see that notice at the top of the file? It says:

    ==

    This file is part of Quake III Arena source code.

    Quake III Arena source code is free software; you can redistribute it

  • FlexiGen

    Running large language models on a single GPU for throughput-oriented scenarios.

    If you're patient, https://github.com/FMInference/FlexGen lets you trade off GPU RAM for system RAM or even disk space.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Run 70B LLM Inference on a Single 4GB GPU with This New Technique

    3 projects | news.ycombinator.com | 3 Dec 2023
  • Colorful Custom RTX 4060 Ti GPU Clocks Outed, 8 GB VRAM Confirmed

    1 project | /r/hardware | 17 Apr 2023
  • FlexGen: Running large language models on a single GPU

    1 project | /r/hypeurls | 26 Mar 2023
  • FlexGen: Running large language models on a single GPU

    1 project | /r/patient_hackernews | 26 Mar 2023
  • FlexGen: Running large language models on a single GPU

    1 project | /r/hackernews | 26 Mar 2023

Did you konow that Python is
the 1st most popular programming language
based on number of metions?