GGML

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

llama.cpp

773 56,891 10.0 C++

LLM inference in C/C++

Its graph execution is still full of busyloops, e.g.:
https://github.com/ggerganov/llama.cpp/blob/44f906e8537fcec9...
I wonder how much more efficient it would be when Taskflow lib was used instead, or even inteltbb.

tinygrad

58 17,800 9.7 Python

Discontinued You like pytorch? You like micrograd? You love tinygrad! ❤️ [Moved to: https://github.com/tinygrad/tinygrad] (by geohot)

Might be a silly question but is GGML a similar/competing library to George Hotz's tinygrad [0]?
[0] https://github.com/geohot/tinygrad

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
MeZO

9 979 6.5 Python

[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333

If MeZO gets implemented, we are basically there: https://github.com/princeton-nlp/MeZO

mlc-llm

89 16,955 9.9 Python

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

I don't know... Hippo is closed source for now.
Its comparable to Apache TVM's vulkan in speed on cuda, see https://github.com/mlc-ai/mlc-llm
But honestly, the biggest advantage of llama.cpp for me is being able to split a model so performantly. My puny 16GB laptop fan just barely, but very practically, run LLaMA 30B at almost 3 tokens/s. That is crazy!

GPTQ-for-LLaMa

75 2,916 8.6 Python

4 bits quantization of LLaMA using GPTQ

With a single NVIDIA 3090 and the fastest inference branch of GPTQ-for-LLAMA https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/fastest-i..., I get a healthy 10-15 tokens per second on the 30B models. IMO GGML is great (And I totally use it) but it's still not as fast as running the models on GPU for now.

willow

37 2,365 9.6 C

Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative

Shameless plug, I'm the founder of Willow[0].
In short you can:
1) Run a local Willow Inference Server[1]. Supports CPU or CUDA, just about the fastest implementation of Whisper out there for "real time" speech.
2) Run local command detection on device. We pull your Home Assistant entites on setup and define basic grammar for them but any English commands (up to 400) that can be processed by Home Assistant are recognized directly on the $50 ESP BOX device and sent to Home Assistant (or openHAB, or a REST endpoint, etc) for processing.
Whether WIS or local our performance target is 500ms from end of speech to command executed.
[0] - https://github.com/toverainc/willow
[1] - https://github.com/toverainc/willow-inference-server

willow-inference-server

7 320 8.3 Python

Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS

Shameless plug, I'm the founder of Willow[0].
In short you can:
1) Run a local Willow Inference Server[1]. Supports CPU or CUDA, just about the fastest implementation of Whisper out there for "real time" speech.
2) Run local command detection on device. We pull your Home Assistant entites on setup and define basic grammar for them but any English commands (up to 400) that can be processed by Home Assistant are recognized directly on the $50 ESP BOX device and sent to Home Assistant (or openHAB, or a REST endpoint, etc) for processing.
Whether WIS or local our performance target is 500ms from end of speech to command executed.
[0] - https://github.com/toverainc/willow
[1] - https://github.com/toverainc/willow-inference-server

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
ggml

69 9,725 9.8 C

Tensor library for machine learning
whisper.spm

1 160 6.1 C

whisper.cpp package for the Swift Package Manager

whisper.cpp is optimized for Apple Silicon and is available as a Swift package
https://github.com/ggerganov/whisper.spm

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

VLLM: 24x faster LLM serving than HuggingFace Transformers

3 projects | news.ycombinator.com | 20 Jun 2023
Show HN: Willow Inference Server: Optimized ASR/TTS/LLM for Willow/WebRTC/REST

3 projects | news.ycombinator.com | 23 May 2023
Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old

1 project | news.ycombinator.com | 28 Feb 2024
Now I Can Just Print That Video

5 projects | news.ycombinator.com | 4 Dec 2023
Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller

14 projects | news.ycombinator.com | 31 Oct 2023

GGML – AI at the Edge

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
llm Deep Learning Privacy speech-recognition speech-to-text
Post date: 6 Jun 2023

llama.cpp

tinygrad

InfluxDB

MeZO

mlc-llm

GPTQ-for-LLaMa

willow

willow-inference-server

SaaSHub

whisper.spm

Related posts

VLLM: 24x faster LLM serving than HuggingFace Transformers

Show HN: Willow Inference Server: Optimized ASR/TTS/LLM for Willow/WebRTC/REST

Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old

Now I Can Just Print That Video

Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller