Three Months of Speed-Up Experiments on a 3090 Ti: Autoregressive DFlash MTP for Qwen3.6-27B

This page summarizes the projects mentioned and recommended in the original post on dev.to

SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. llama.cpp

    LLM inference in C/C++

    Binary: vanilla llama.cpp, build-mtp/bin/llama-server, built from the MTP PR branch (commit ebe4fca, PR #22673). PR #22673 merged to master on 2026-05-16, so any master checkout after that date ships --spec-type mtp natively.

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. lucebox-hub

    Lucebox optimization hub: hand-tuned LLM inference, built for specific consumer hardware.

    The Dell T5820 install was the hardware story (companion post forthcoming). DFlash was the software follow-up. Initial scan of Luce-Org/lucebox-hub (advertising 3.43x decode + 10x TTFT on RTX 3090) ran into the same blocker: their daemon is a raw generate primitive with no OpenAI API, no jinja chat templates, no tool calling. Slotting it behind Hermes/k2 would need a chat-template shim written from scratch.

  4. beellama.cpp

    DFlash & TurboQuant in llama.cpp with up to 3x faster generation and 7.5x more KV cache in same VRAM

    BeeLlama.cpp by Anbeeld already had the shim baked in: DFlash speculative decoding, TurboQuant KV cache, and CopySpec fallback layered onto the OpenAI server with --jinja and tool-call detection preserved. Different binary. Same flags Hermes needed.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • PFlash: 10x prefill speedup over llama.cpp at 128K on a RTX 3090

    1 project | news.ycombinator.com | 1 May 2026
  • How to Setup a Local Coding Agent on macOS

    5 projects | news.ycombinator.com | 12 Jun 2026
  • Mtplx 1.0.0

    1 project | news.ycombinator.com | 11 Jun 2026
  • Show HN: Local AI server with persistent memory, RAG and plugins

    1 project | news.ycombinator.com | 27 May 2026
  • BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090

    1 project | dev.to | 22 May 2026

Did you know that C++ is
the 7th most popular programming language
based on number of references?