mlc-llm

Universal LLM Deployment Engine with ML Compilation (by mlc-ai)

Mlc-llm Alternatives

Similar projects and alternatives to mlc-llm

  1. llama.cpp

    LLM inference in C/C++

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. textgen

    887 mlc-llm VS textgen

    Open-source desktop app for local LLMs. Text, vision, tool-calling, OpenAI/Anthropic-compatible API. 100% private.

  4. ollama

    750 mlc-llm VS ollama

    Get up and running with Kimi-K2.6, GLM-5.1, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

  5. whisper.cpp

    Port of OpenAI's Whisper model in C/C++

  6. ROCm

    198 mlc-llm VS ROCm

    Discontinued AMD ROCm™ Software - GitHub Home [Moved to: https://github.com/ROCm/ROCm]

  7. koboldcpp

    Run GGUF models easily with a KoboldAI UI. One File. Zero Install.

  8. gpt4all

    GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

  9. FastChat

    86 mlc-llm VS FastChat

    An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

  10. ggml

    76 mlc-llm VS ggml

    Tensor library for machine learning

  11. llama-cpp-python

    Python bindings for llama.cpp

  12. web-llm

    62 mlc-llm VS web-llm

    High-performance In-browser LLM Inference Engine

  13. exllama

    66 mlc-llm VS exllama

    A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

  14. dalai

    59 mlc-llm VS dalai

    The simplest way to run LLaMA on your local machine

  15. open_llama

    OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset

  16. triton

    50 mlc-llm VS triton

    Development repository for the Triton language and compiler

  17. Cgml

    GPU-targeted vendor-agnostic AI library for Windows, and Mistral model implementation.

  18. jsonformer

    25 mlc-llm VS jsonformer

    A Bulletproof Way to Generate Structured JSON from Language Models

  19. CTranslate2

    Fast inference engine for Transformer models

  20. sparsegpt

    Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

  21. SillyTavern

    79 mlc-llm VS SillyTavern

    LLM Frontend for Power Users.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better mlc-llm alternative or higher similarity.

mlc-llm discussion

Log in or Post with

mlc-llm reviews and mentions

Posts with mentions or reviews of mlc-llm. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-12-23.
  • Making AMD GPUs competitive for LLM inference
    10 projects | news.ycombinator.com | 23 Dec 2024
    It depends on what you mean by "this." MLC's catch is that you need to define/compile models for it with TVM. Here is the list of supported model architectures: https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_llm/m...

    llama.cpp has a much bigger supported model list, as does vLLM and of course PyTorch/HF transformers covers everything else, all of which work w/ ROCm on RDNA3 w/o too much fuss these days.

    For inference, the biggest caveat is that Flash Attention is only an aotriton implementation, which besides being less performant sometimes, also doesn't support SWA. For CDNA there is a better CK-based version of FA, but CK doesn't not have RDNA support. There are a couple people at AMD apparently working on native FlexAttention, os I guess we'll how that turns out.

    (Note the recent SemiAccurate piece was on training, which I'd agree is in a much worse state (I have personal experience with it being often broken for even the simplest distributed training runs). Funnily enough, if you're running simple fine tunes on a single RDNA3 card, you'll probably have a better time. OOTB, a 7900 XTX will train at about the same speed as an RTX 3090 (4090s blow both of those away, but you'll probably want more cards and VRAM of just move to H100s).

  • FLaNK 04 March 2024
    26 projects | dev.to | 4 Mar 2024
  • Ai on a android phone?
    2 projects | /r/LocalLLaMA | 8 Dec 2023
    This one uses gpu, it doesn't support Mistral yet: https://github.com/mlc-ai/mlc-llm
  • MLC vs llama.cpp
    2 projects | /r/LocalLLaMA | 7 Nov 2023
    I have tried running mistral 7B with MLC on my m1 metal. And it kept crushing (git issue with description). Memory inefficiency problems.
  • [Project] Scaling LLama2 70B with Multi NVIDIA and AMD GPUs under 3k budget
    1 project | /r/LocalLLaMA | 21 Oct 2023
    Project: https://github.com/mlc-ai/mlc-llm
  • Scaling LLama2-70B with Multi Nvidia/AMD GPU
    2 projects | news.ycombinator.com | 19 Oct 2023
  • AMD May Get Across the CUDA Moat
    8 projects | news.ycombinator.com | 6 Oct 2023
    For LLM inference, a shoutout to MLC LLM, which runs LLM models on basically any API that's widely available: https://github.com/mlc-ai/mlc-llm
  • ROCm Is AMD's #1 Priority, Executive Says
    5 projects | news.ycombinator.com | 26 Sep 2023
    One of your problems might be that gfx1032 is not supported by AMD's ROCm packages, which has a laughably short list of supported hardware: https://rocm.docs.amd.com/en/latest/release/gpu_os_support.h...

    The normal workaround is to assign the closest architecture, eg gfx1030, so `HSA_OVERRIDE_GFX_VERSION=10.3.0` might help

    Also, it looks like some of your tested projects are OpenCL? For me, I do something like: `yay -S rocm-hip-sdk rocm-ml-sdk rocm-opencl-sdk` to cover all the bases.

    My recent interest has been LLMs and this is my general step by step for those (llama.cpp, exllama) for those interested: https://llm-tracker.info/books/howto-guides/page/amd-gpus

    I didn't port the docs back in, but also here's a step-by-step w/ my adventures getting TVM/MLC working w/ an APU: https://github.com/mlc-ai/mlc-llm/issues/787

    From my experience, ROCm is improving, but there's a good reason that Nvidia has 90% market share even at big price premiums.

  • Show HN: Ollama for Linux – Run LLMs on Linux with GPU Acceleration
    14 projects | news.ycombinator.com | 26 Sep 2023
    Maybe they're talking about https://github.com/mlc-ai/mlc-llm which is used for web-llm (https://github.com/mlc-ai/web-llm)? Seems to be using TVM.
  • Show HN: Fine-tune your own Llama 2 to replace GPT-3.5/4
    8 projects | news.ycombinator.com | 12 Sep 2023
    you already have TVM for the cross platform stuff

    see https://tvm.apache.org/docs/how_to/deploy/android.html

    or https://octoml.ai/blog/using-swift-and-apache-tvm-to-develop...

    or https://github.com/mlc-ai/mlc-llm

  • A note from our sponsor - SaaSHub
    www.saashub.com | 13 Jun 2026
    SaaSHub helps you find the best software and product alternatives Learn more →

Stats

Basic mlc-llm repo stats
90
22,784
9.0
about 1 month ago

mlc-ai/mlc-llm is an open source project licensed under Apache License 2.0 which is an OSI approved license.

The primary programming language of mlc-llm is Python.


Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that Python is
the 1st most popular programming language
based on number of references?