TensorRT-LLM vs htmx

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines. (by NVIDIA)

Suggest topics

Source Code

nvidia.github.io

Suggest alternative

Edit details

htmx

</> htmx - high power tools for HTML (by bigskysoftware)

HTML JavaScript Hateoas REST htmx Hyperscript

Source Code

htmx.org

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

TensorRT-LLM		htmx
	Project
14	Mentions	569
6,890	Stars	33,428
10.8%	Growth	5.3%
8.4	Activity	9.6
5 days ago	Latest Commit	6 days ago
C++	Language	JavaScript
Apache License 2.0	License	GNU General Public License v3.0 or later

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

TensorRT-LLM

Posts with mentions or reviews of TensorRT-LLM. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-28.

Ollama v0.1.33 with Llama 3, Phi 3, and Qwen 110B
11 projects | news.ycombinator.com | 28 Apr 2024

Yes, we are also looking at integrating MLX [1] which is optimized for Apple Silicon and built by an incredible team of individuals, a few of which were behind the original Torch [2] project. There's also TensorRT-LLM [3] by Nvidia optimized for their recent hardware.
All of this of course acknowledging that llama.cpp is an incredible project with competitive performance and support for almost any platform.
[1] https://github.com/ml-explore/mlx
[2] https://en.wikipedia.org/wiki/Torch_(machine_learning)
[3] https://github.com/NVIDIA/TensorRT-LLM
FLaNK AI for 11 March 2024
46 projects | dev.to | 11 Mar 2024
FLaNK Stack 26 February 2024
50 projects | dev.to | 26 Feb 2024

NVIDIA GPU LLM https://github.com/NVIDIA/TensorRT-LLM
FLaNK Stack Weekly 19 Feb 2024
50 projects | dev.to | 19 Feb 2024
Nvidia Chat with RTX
2 projects | news.ycombinator.com | 13 Feb 2024

https://github.com/NVIDIA/TensorRT-LLM
It's quite a thin wrapper around putting both projects into %LocalAppData%, along with a miniconda environment with the correct dependnancies installed. Also for some reason the LLaMA 13b (24.5GB) and Ministral 7b (13.6GB) but only installed Ministral?
Ministral 7b runs about as accurate as I remeber, but responses are faster than I can read. This seems at the cost of context and variance/temperature - although it's a chat interface the implementation doesn't seem to take into account previous questions or answers. Asking it the same question also gives the same answer.
The RAG (llamaindex) is okay, but a little suspect. The installation comes with a default folder dataset, containing text files of nvidia marketing materials. When I tried asking questions about the files, it often cites the wrong file even if it gave the right answer.
Nvidia's Chat with RTX is a promising AI chatbot that runs locally on your PC
7 projects | news.ycombinator.com | 13 Feb 2024

Yeah, seems a bit odd because the TensorRT-LLM repo lists Turing as supported architecture.
https://github.com/NVIDIA/TensorRT-LLM?tab=readme-ov-file#pr...
MK1 Flywheel Unlocks the Full Potential of AMD Instinct for LLM Inference
3 projects | news.ycombinator.com | 8 Jan 2024

I support any progress to erode the Nvidia monopoly.
That said from what I'm seeing here the free and open source (less other aspects of the CUDA stack, of course) TensorRT-LLM[0] almost certainly bests this implementation using the Nvidia hardware they reference for comparison.
I don't have an A6000 but as an example with the tensorrt_llm backend for Nvidia Triton Inference Server (also free and open source) I get roughly 30 req/s with Mistral 7B on my RTX 4090 with significantly lower latency. Comparison benchmarks are tough, especially when published benchmarks like these are fairly scant on the real details.
TensorRT-LLM has only been public for a few months and if you peruse the docs, PRs, etc you'll see they have many more optimizations in the works.
In typical Nvidia fashion TensorRT-LLM runs on any Nvidia card (from laptop to datacenter) going back to Turing (five year old cards) assuming you have the VRAM.
You can download and run this today, free and "open source" for these implementations at least. I'm extremely skeptical of the claim "MK1 Flywheel has the Best Throughput and Latency for LLM Inference on NVIDIA". You'll note they compare to vLLM, which is an excellent and incredible project but if you look at vLLM vs Triton w/ TensorRT-LLM the performance improvements are dramatic.
Of course it's the latest and greatest ($$$$$$ and unobtanium) but one look at H100/H200 performance[3] and you can see what happens when the vendor has a robust software ecosystem to help sell their hardware. Pay the Nvidia tax on the frontend for the hardware, get it back as a dividend on the software.
I feel like MK1 must be aware of TensorRT-LLM but of course those comparison benchmarks won't help sell their startup.
[0] - https://github.com/NVIDIA/TensorRT-LLM
[1] - https://github.com/triton-inference-server/tensorrtllm_backe...
[2] - https://mkone.ai/blog/mk1-flywheel-race-tuned-and-track-read...
[3] - https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source...
FP8 quantized results are bad compared to int8 results
1 project | /r/LocalLLaMA | 7 Dec 2023

I have followed the instructions on https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama to convert the float16 Llama2 13B to FP8 and build a tensorRT-LLM engine.
Optimum-NVIDIA - 28x faster inference in just 1 line of code !?
4 projects | /r/LocalLLaMA | 6 Dec 2023
Incoming: TensorRT-LLM version 0.6 with support for MoE, new models and more quantization
1 project | /r/LocalLLaMA | 5 Dec 2023

htmx

Posts with mentions or reviews of htmx. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-05-07.

Hanami and HTMX - progress bar
5 projects | dev.to | 7 May 2024

Hi there! I want to show off a little feature I made using hanami, htmx and a little bit of redis + sidekiq.
Migrating Next.js App to GO + Templ & HTMX
5 projects | dev.to | 5 May 2024

Recently, I just rewrite one of my application Stashbin from Next.js to GO. Though my main motivation of this migration was to learn GO and experimenting with HTMX. I also aiming to reduce the resource usage of my application and simplify the deployment process. Initially, Stashbin codebase are split into two seperate repository, one for the frontend that uses Next.js and another for the backend that already uses GO. The backend repository is just a REST API responsible for storing and retreiving data from the database.
🕸️ Web development trends we will see in 2024 👀
3 projects | dev.to | 2 May 2024

HTMX is another library that gained popularity due to its server-first approach to rendering data, although seeking a much simpler way of appealing to developers.
Reusable Input Datalist
1 project | dev.to | 6 Apr 2024

When I work with HTMX I need isolated component that can be reusable a form. So I create a PHP Function that generate the Input Datalist.
HTMZ inspired form subission
2 projects | dev.to | 31 Mar 2024

I was inspired by htmz (which was in turn inspired by htmx) and how the author got pretty close to a basic htmx-like experience just using an iframe. I wanted to push it a little further so whipped this demo together. My submission demonstrates progressive enhancement for the form - with js enabled the request targets an iframe that is inserted into the dom, meaning the page doesn't actually navigate (similar to event.preventDefault()). The iframe receives the html response from the request and on load triggers a function to swap out it's contents into the main page.
Example Java Application with Embedded Jetty and a htmx Website
3 projects | dev.to | 28 Mar 2024

As described on htmx.org: "htmx gives you access to AJAX, CSS Transitions, WebSockets and Server Sent Events directly in HTML, using attributes, so you can build modern user interfaces with the simplicity and power of hypertext"
Show HN: ZakuChess, an open source web game built with Django, Htmx and Tailwind
3 projects | news.ycombinator.com | 9 Mar 2024

Apart from the source code itself, the repo's README also gives a bit more details about the various packages I used.
1. htmx: https://htmx.org/
Show HN: Alpine Ajax – If Htmx and Alpine.js Had a Baby
1 project | news.ycombinator.com | 4 Mar 2024

Also, there’s some response header juggling you have to do when submitting forms that have a validation step before redirecting: https://github.com/bigskysoftware/htmx/issues/369
I’ve tried to iron out any footguns or server requirements I’ve bumped into while using HTMX & Hotwire in my projects.
🤓 My top 3 Go packages that I wish I'd known about earlier
6 projects | dev.to | 1 Mar 2024

✨ In recent months, I have been developing web projects using GOTTHA stack: Go + Templ + Tailwind CSS + htmx + Alpine.js. As soon as I'm ready to talk about all the subtleties and pitfalls, I'll post it on my social networks.
FLaNK Stack 26 February 2024
50 projects | dev.to | 26 Feb 2024

What are some alternatives?

When comparing TensorRT-LLM and htmx you can also consider the following projects:

ChatRTX - A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM

Alpine.js - A rugged, minimal framework for composing JavaScript behavior in your markup.

gpt-fast - Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Vue.js - This is the repo for Vue 2. For Vue 3, go to https://github.com/vuejs/core

optimum-nvidia

astro - The web framework for content-driven websites. ⭐️ Star to support our work!

stable-fast - Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

unpoly - Progressive enhancement for HTML

tensorrtllm_backe

react-snap - 👻 Zero-configuration framework-agnostic static prerendering for SPAs

daytona - The Open Source Dev Environment Manager.

django-unicorn - The magical reactive component framework for Django ✨

TensorRT-LLM vs ChatRTX htmx vs Alpine.js TensorRT-LLM vs gpt-fast htmx vs Vue.js TensorRT-LLM vs optimum-nvidia htmx vs astro TensorRT-LLM vs stable-fast htmx vs unpoly TensorRT-LLM vs tensorrtllm_backe htmx vs react-snap TensorRT-LLM vs daytona htmx vs django-unicorn

Compare TensorRT-LLM vs htmx and see what are their differences.

TensorRT-LLM

htmx

TensorRT-LLM

htmx

What are some alternatives?