Llama Is Expensive

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • exllama

    A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

  • > We serve Llama on 2 80-GB A100 GPUs, as that is the minumum required to fit Llama in memory (with 16-bit precision)

    Well there is your problem.

    LLaMA quantized to 4 bits fits in 40GB. And it gets similar throughput split between dual consumer GPUs, which likely means better throughput on a single 40GB A100 (or a cheaper 48GB Pro GPU)

    https://github.com/turboderp/exllama#dual-gpu-results

    Also, I'm not sure which model was tested, but Llama 70B chat should have better performance than the base model if the prompting syntax is right. That was only reverse engineered from the Meta demo implementation recently.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Mycodo – Environmental Regulation System

    1 project | news.ycombinator.com | 9 May 2024
  • The new REPL in Python 3.13

    1 project | news.ycombinator.com | 9 May 2024
  • Show HN: Exploring HN by mapping and analyzing 40M posts and comments for fun

    2 projects | news.ycombinator.com | 9 May 2024
  • Show HN: Open-Source SlackAI app for those who don't want to pay $10/user/month

    2 projects | news.ycombinator.com | 9 May 2024
  • Temporal Python – A Durable, Distributed Asyncio Event Loop

    2 projects | news.ycombinator.com | 9 May 2024