The Coming of Local LLMs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • more-ane-transformers

    Run transformers (incl. LLMs) on the Apple Neural Engine.

  • Apple should get working on a version of the Neural Engine that is useful for these models, and remove the 3GB size limit [1] to take full advantage of the 'unified' memory architecture. Game changer.

    Waste of die space currently

    [1] https://github.com/smpanaro/more-ane-transformers/blob/main/...

  • llama.cpp

    LLM inference in C/C++

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • llama-cpp-python

    Python bindings for llama.cpp

  • You can see for yourself (assuming you have the model weights) https://github.com/abetlen/llama-cpp-python

    I get around ~140 ms per token running a 13B parameter model on a thinkpad laptop with an 14 core intel i7-9750 processor. Because it's CPU inference the initial prompt takes longer to process so total latency is still higher than I'd like but I'm working on some caching solutions that should make this bareable for things like chat.

  • rwkv.cpp

    INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

  • Also worth checking out https://github.com/saharNooby/rwkv.cpp which is based on Georgi's library and offers support for the RWKV family of models which are Apache-2.0 licensed.

  • shady.ai

    Making offline AI models accessible to all types of edge devices.

  • I’ve got some of their smaller Raven models running locally on my M1 (only 16GB of RAM).

    I’m also in the middle of making it user friendly to run these models on all platforms (built with Flutter). First MacOS release will be out before this weekend: https://github.com/BrutalCoding/shady.ai

  • tinygrad

    Discontinued You like pytorch? You like micrograd? You love tinygrad! ❤️ [Moved to: https://github.com/tinygrad/tinygrad] (by geohot)

  • tinygrad

    https://github.com/geohot/tinygrad/tree/master/accel/ane

    But I have not tested it on Linux since Asahi has not yet added support.

    llama.cpp runs at 18ms per token (7B) and 200ms per token (65B) without quantization.

  • coral-pi-rest-server

    Perform inferencing of tensorflow-lite models on an RPi with acceleration from Coral USB stick

  • These ones can be plugged in with USB type c.

    https://coral.ai/products/accelerator/

    Used for boosting interference (offline) on Linux, Mac and Windows

    Haven’t bought or used them but I’ve had my eyes on these for a little while!

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts