Show HN: Ollama for Linux – Run LLMs on Linux with GPU Acceleration

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • ollama

    Get up and running with Llama 3, Mistral, Gemma, and other large language models.

  • Ollama is awesome. I am however still waiting for the support of controlling the model cache location: https://github.com/jmorganca/ollama/issues/153

    This is either for backup purpose, or to share model files with other applications. Those model files are large!

  • llama.cpp

    LLM inference in C/C++

  • > run their own LLMs on Linux and the unfortunate answer was always that the existing options were slightly complicate

    What about https://github.com/ggerganov/llama.cpp ?

    It compiles and run easily on Linux.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • mlc-llm

    Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

  • Maybe they're talking about https://github.com/mlc-ai/mlc-llm which is used for web-llm (https://github.com/mlc-ai/web-llm)? Seems to be using TVM.

  • web-llm

    Bringing large-language models and chat to web browsers. Everything runs inside the browser with no server support.

  • Maybe they're talking about https://github.com/mlc-ai/mlc-llm which is used for web-llm (https://github.com/mlc-ai/web-llm)? Seems to be using TVM.

  • Dumbar

    A smrt, no, smart, ok, no dumb smartbar for Ollama

  • give Dumbar a try, since you're on macOS! https://github.com/JerrySievert/Dumbar

  • cody

    AI that knows your entire codebase

  • Ollama is awesome. I am part of a team building a code AI application[1], and we want to give devs the option to run it locally instead of only supporting external LLMs from Anthropic, OpenAI, etc. Those big remote LLMs are incredibly powerful and probably the right choice for most devs, but it's good for devs to have a local option as well—for security, privacy, cost, latency, simplicity, freedom, etc.

    As an app dev, we have 2 choices:

    (1) Build our own support for LLMs, GPU/CPU execution, model downloading, inference optimizations, etc.

    (2) Just tell users "run Ollama" and have our app hit the Ollama API on localhost (or shell out to `ollama`).

    Obviously choice 2 is much, much simpler. There are some things in the middle, like less polished wrappers around llama.cpp, but Ollama is the only thing that 100% of people I've told about have been able to install without any problems.

    That's huge because it's finally possible to build real apps that use local LLMs—and still reach a big userbase. Your userbase is now (pretty much) "anyone who can download and run a desktop app and who has a relatively modern laptop", which is a big population.

    I'm really excited to see what people build on Ollama.

    (And Ollama will simplify deploying server-side LLM apps as well, but right now from participating in the community, it seems most people are only thinking of it for local apps. I expect that to change when people realize that they can ship a self-contained server app that runs on a cheap AWS/GCP instance and uses an Ollama-executed LLM for various features.)

    [1] Shameless plug for the WIP PR where I'm implementing Ollama support in Cody, our code AI app: https://github.com/sourcegraph/cody/pull/905.

  • koboldcpp

    A simple one-file way to run various GGML and GGUF models with KoboldAI's UI

  • Koboldcpp does this: https://github.com/LostRuins/koboldcpp/releases/tag/v1.44.2

    They basically just ship executables for different llama.cpp backends and select the correct one with a python script, which is fine, as the executables are really small.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • koboldcpp-rocm

    AI Inferencing at the Edge. A simple one-file way to run various GGML models with KoboldAI's UI with AMD ROCm offloading

  • This one is basically SOTA for AMD, if you can install rocm properly:

    https://github.com/YellowRoseCx/koboldcpp-rocm

  • triton

    Development repository for the Triton language and compiler

  • There's a ton of cool opportunity in the runtime layer. I've been keeping my eye on the compiler-based approaches. From what I've gathered many of the larger "production" inference tools use compilers:

    - https://github.com/openai/triton

  • TensorRT

    NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

  • - https://github.com/NVIDIA/TensorRT

    TVM and other compiler-based approaches seem to really perform really well and make supporting different backends really easy. A good friend who's been in this space for a while told me llama.cpp is sort of a "hand crafted" version of what these compilers could output, which I think speaks to the craftmanship Georgi and the ggml team have put into llama.cpp, but also the opportunity to "compile" versions of llama.cpp for other model architectures or platforms.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts