Llama.rs – Rust port of llama.cpp for fast LLaMA inference on CPU

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • llm

    An ecosystem of Rust libraries for working with large language models

  • I've counted three different Rust LLaMA implementations on r/rust subreddit this week:

    https://github.com/Noeda/rllama/ (pure Rust+OpenCL)

    https://github.com/setzer22/llama-rs/ (ggml based)

    https://github.com/philpax/ggllama (also ggml based)

    There's also a discussion on GitHub issue on setzer's repo to collaborate a bit on these separate efforts: https://github.com/setzer22/llama-rs/issues/4

  • llama.cpp

    LLM inference in C/C++

  • I feel like https://github.com/ggerganov/llama.cpp/issues/171 is a better approach here?

    With how fast llama.cpp is changing, this seems like a lot of churn for no reason.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • rllama

    Rust+OpenCL+AVX2 implementation of LLaMA inference code

  • I've counted three different Rust LLaMA implementations on r/rust subreddit this week:

    https://github.com/Noeda/rllama/ (pure Rust+OpenCL)

    https://github.com/setzer22/llama-rs/ (ggml based)

    https://github.com/philpax/ggllama (also ggml based)

    There's also a discussion on GitHub issue on setzer's repo to collaborate a bit on these separate efforts: https://github.com/setzer22/llama-rs/issues/4

  • ggllama

    Discontinued `ggllama` is a Rust port of ggerganov's llama.cpp.

  • I've counted three different Rust LLaMA implementations on r/rust subreddit this week:

    https://github.com/Noeda/rllama/ (pure Rust+OpenCL)

    https://github.com/setzer22/llama-rs/ (ggml based)

    https://github.com/philpax/ggllama (also ggml based)

    There's also a discussion on GitHub issue on setzer's repo to collaborate a bit on these separate efforts: https://github.com/setzer22/llama-rs/issues/4

  • GPTQ-for-LLaMa

    4 bits quantization of LLaMA using GPTQ

  • Do you know if any of them support GPTQ [1], either end-to-end or just by importing weights that were previously quantized with GPTQ? Apparently GPTQ provides a significant quality boost “for free”.

    I haven’t had time to look into this in detail, but apparently llama.cpp doesn’t support it yet [2] though it will soon. And the original implementation only works with CUDA.

    [1] https://github.com/qwopqwop200/GPTQ-for-LLaMa/

    [2] https://github.com/ggerganov/llama.cpp/issues/9

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts