Llama Is Expensive

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

exllama

64 2,609 9.0 Python

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

> We serve Llama on 2 80-GB A100 GPUs, as that is the minumum required to fit Llama in memory (with 16-bit precision)
Well there is your problem.
LLaMA quantized to 4 bits fits in 40GB. And it gets similar throughput split between dual consumer GPUs, which likely means better throughput on a single 40GB A100 (or a cheaper 48GB Pro GPU)
https://github.com/turboderp/exllama#dual-gpu-results
Also, I'm not sure which model was tested, but Llama 70B chat should have better performance than the base model if the prompting syntax is right. That was only reverse engineered from the Meta demo implementation recently.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Mycodo – Environmental Regulation System

1 project | news.ycombinator.com | 9 May 2024
The new REPL in Python 3.13

1 project | news.ycombinator.com | 9 May 2024
Show HN: Exploring HN by mapping and analyzing 40M posts and comments for fun

2 projects | news.ycombinator.com | 9 May 2024
Show HN: Open-Source SlackAI app for those who don't want to pay $10/user/month

2 projects | news.ycombinator.com | 9 May 2024
Temporal Python – A Durable, Distributed Asyncio Event Loop

2 projects | news.ycombinator.com | 9 May 2024

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Post date: 20 Jul 2023

exllama

InfluxDB

Related posts

Mycodo – Environmental Regulation System

The new REPL in Python 3.13

Show HN: Exploring HN by mapping and analyzing 40M posts and comments for fun

Show HN: Open-Source SlackAI app for those who don't want to pay $10/user/month

Temporal Python – A Durable, Distributed Asyncio Event Loop