Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

llama.cpp

773 56,891 10.0 C++

LLM inference in C/C++

The speedup would not be that high in practice for folks already using speculative sampling[1]. ANPD appears to be similar but uses a simpler, faster, and less accurate drafting approach. These two enhancements can't be meaningfully stacked.
[1] https://github.com/ggerganov/llama.cpp/pull/2926

transformers

176 125,369 10.0 Python

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

The HuggingFace transformers library already has support for a similar method called prompt lookup decoding that uses the existing context to generate an ngram model: https://github.com/huggingface/transformers/issues/27722
I don't think it would be that hard to switch it out for a pretrained ngram model.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

AI enthusiasm #6 - Finetune any LLM you want💡

2 projects | dev.to | 16 Apr 2024
Schedule-Free Learning – A New Way to Train

3 projects | news.ycombinator.com | 6 Apr 2024
Gemma doesn't suck anymore – 8 bug fixes

3 projects | news.ycombinator.com | 11 Mar 2024
HuggingFace Transformers: Qwen2

1 project | news.ycombinator.com | 11 Jan 2024
HuggingFace Transformers Release v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2

1 project | news.ycombinator.com | 13 Dec 2023

Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
NLP llama Natural Language Processing llm Pytorch
Post date: 21 Apr 2024

llama.cpp

transformers

InfluxDB

Related posts

AI enthusiasm #6 - Finetune any LLM you want💡

Schedule-Free Learning – A New Way to Train

Gemma doesn't suck anymore – 8 bug fixes

HuggingFace Transformers: Qwen2

HuggingFace Transformers Release v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2

Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com NLP llama Natural Language Processing llm Pytorch Post date: 21 Apr 2024

llama.cpp

transformers

InfluxDB

Related posts

AI enthusiasm #6 - Finetune any LLM you want💡

Schedule-Free Learning – A New Way to Train

Gemma doesn't suck anymore – 8 bug fixes

HuggingFace Transformers: Qwen2

HuggingFace Transformers Release v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
NLP llama Natural Language Processing llm Pytorch
Post date: 21 Apr 2024