GGML – AI at the Edge

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • llama.cpp

    LLM inference in C/C++

  • Its graph execution is still full of busyloops, e.g.:

    https://github.com/ggerganov/llama.cpp/blob/44f906e8537fcec9...

    I wonder how much more efficient it would be when Taskflow lib was used instead, or even inteltbb.

  • tinygrad

    Discontinued You like pytorch? You like micrograd? You love tinygrad! ❤️ [Moved to: https://github.com/tinygrad/tinygrad] (by geohot)

  • Might be a silly question but is GGML a similar/competing library to George Hotz's tinygrad [0]?

    [0] https://github.com/geohot/tinygrad

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • MeZO

    [NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333

  • If MeZO gets implemented, we are basically there: https://github.com/princeton-nlp/MeZO

  • mlc-llm

    Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

  • I don't know... Hippo is closed source for now.

    Its comparable to Apache TVM's vulkan in speed on cuda, see https://github.com/mlc-ai/mlc-llm

    But honestly, the biggest advantage of llama.cpp for me is being able to split a model so performantly. My puny 16GB laptop fan just barely, but very practically, run LLaMA 30B at almost 3 tokens/s. That is crazy!

  • GPTQ-for-LLaMa

    4 bits quantization of LLaMA using GPTQ

  • With a single NVIDIA 3090 and the fastest inference branch of GPTQ-for-LLAMA https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/fastest-i..., I get a healthy 10-15 tokens per second on the 30B models. IMO GGML is great (And I totally use it) but it's still not as fast as running the models on GPU for now.

  • willow

    Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative

  • Shameless plug, I'm the founder of Willow[0].

    In short you can:

    1) Run a local Willow Inference Server[1]. Supports CPU or CUDA, just about the fastest implementation of Whisper out there for "real time" speech.

    2) Run local command detection on device. We pull your Home Assistant entites on setup and define basic grammar for them but any English commands (up to 400) that can be processed by Home Assistant are recognized directly on the $50 ESP BOX device and sent to Home Assistant (or openHAB, or a REST endpoint, etc) for processing.

    Whether WIS or local our performance target is 500ms from end of speech to command executed.

    [0] - https://github.com/toverainc/willow

    [1] - https://github.com/toverainc/willow-inference-server

  • willow-inference-server

    Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS

  • Shameless plug, I'm the founder of Willow[0].

    In short you can:

    1) Run a local Willow Inference Server[1]. Supports CPU or CUDA, just about the fastest implementation of Whisper out there for "real time" speech.

    2) Run local command detection on device. We pull your Home Assistant entites on setup and define basic grammar for them but any English commands (up to 400) that can be processed by Home Assistant are recognized directly on the $50 ESP BOX device and sent to Home Assistant (or openHAB, or a REST endpoint, etc) for processing.

    Whether WIS or local our performance target is 500ms from end of speech to command executed.

    [0] - https://github.com/toverainc/willow

    [1] - https://github.com/toverainc/willow-inference-server

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • ggml

    Tensor library for machine learning

  • whisper.spm

    whisper.cpp package for the Swift Package Manager

  • whisper.cpp is optimized for Apple Silicon and is available as a Swift package

    https://github.com/ggerganov/whisper.spm

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • VLLM: 24x faster LLM serving than HuggingFace Transformers

    3 projects | news.ycombinator.com | 20 Jun 2023
  • Show HN: Willow Inference Server: Optimized ASR/TTS/LLM for Willow/WebRTC/REST

    3 projects | news.ycombinator.com | 23 May 2023
  • Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old

    1 project | news.ycombinator.com | 28 Feb 2024
  • Now I Can Just Print That Video

    5 projects | news.ycombinator.com | 4 Dec 2023
  • Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller

    14 projects | news.ycombinator.com | 31 Oct 2023