Show HN: Ollama for Linux – Run LLMs on Linux with GPU Acceleration

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

ollama

192 58,943 9.9 Go

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

Ollama is awesome. I am however still waiting for the support of controlling the model cache location: https://github.com/jmorganca/ollama/issues/153
This is either for backup purpose, or to share model files with other applications. Those model files are large!

llama.cpp

769 55,846 10.0 C++

LLM inference in C/C++

> run their own LLMs on Linux and the unfortunate answer was always that the existing options were slightly complicate
What about https://github.com/ggerganov/llama.cpp ?
It compiles and run easily on Linux.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
mlc-llm

89 16,774 9.9 Python

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

Maybe they're talking about https://github.com/mlc-ai/mlc-llm which is used for web-llm (https://github.com/mlc-ai/web-llm)? Seems to be using TVM.

web-llm

42 9,018 9.0 TypeScript

Bringing large-language models and chat to web browsers. Everything runs inside the browser with no server support.

Maybe they're talking about https://github.com/mlc-ai/mlc-llm which is used for web-llm (https://github.com/mlc-ai/web-llm)? Seems to be using TVM.

Dumbar

2 51 4.6 JavaScript

A smrt, no, smart, ok, no dumb smartbar for Ollama

give Dumbar a try, since you're on macOS! https://github.com/JerrySievert/Dumbar

cody

22 1,812 9.9 TypeScript

AI that knows your entire codebase

Ollama is awesome. I am part of a team building a code AI application[1], and we want to give devs the option to run it locally instead of only supporting external LLMs from Anthropic, OpenAI, etc. Those big remote LLMs are incredibly powerful and probably the right choice for most devs, but it's good for devs to have a local option as well—for security, privacy, cost, latency, simplicity, freedom, etc.
As an app dev, we have 2 choices:
(1) Build our own support for LLMs, GPU/CPU execution, model downloading, inference optimizations, etc.
(2) Just tell users "run Ollama" and have our app hit the Ollama API on localhost (or shell out to `ollama`).
Obviously choice 2 is much, much simpler. There are some things in the middle, like less polished wrappers around llama.cpp, but Ollama is the only thing that 100% of people I've told about have been able to install without any problems.
That's huge because it's finally possible to build real apps that use local LLMs—and still reach a big userbase. Your userbase is now (pretty much) "anyone who can download and run a desktop app and who has a relatively modern laptop", which is a big population.
I'm really excited to see what people build on Ollama.
(And Ollama will simplify deploying server-side LLM apps as well, but right now from participating in the community, it seems most people are only thinking of it for local apps. I expect that to change when people realize that they can ship a self-contained server app that runs on a cheap AWS/GCP instance and uses an Ollama-executed LLM for various features.)
[1] Shameless plug for the WIP PR where I'm implementing Ollama support in Cody, our code AI app: https://github.com/sourcegraph/cody/pull/905.

koboldcpp

180 3,749 10.0 C++

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI

Koboldcpp does this: https://github.com/LostRuins/koboldcpp/releases/tag/v1.44.2
They basically just ship executables for different llama.cpp backends and select the correct one with a python script, which is fine, as the executables are really small.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
koboldcpp-rocm

3 259 10.0 C++

AI Inferencing at the Edge. A simple one-file way to run various GGML models with KoboldAI's UI with AMD ROCm offloading

This one is basically SOTA for AMD, if you can install rocm properly:
https://github.com/YellowRoseCx/koboldcpp-rocm

triton

30 10,981 9.9 C++

Development repository for the Triton language and compiler

There's a ton of cool opportunity in the runtime layer. I've been keeping my eye on the compiler-based approaches. From what I've gathered many of the larger "production" inference tools use compilers:
- https://github.com/openai/triton

TensorRT

22 9,065 5.0 C++

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

- https://github.com/NVIDIA/TensorRT
TVM and other compiler-based approaches seem to really perform really well and make supporting different backends really easy. A good friend who's been in this space for a while told me llama.cpp is sort of a "hand crafted" version of what these compilers could output, which I think speaks to the craftmanship Georgi and the ggml team have put into llama.cpp, but also the opportunity to "compile" versions of llama.cpp for other model architectures or platforms.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project