-
exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
Exllama for example uses buffers on each card that reduce the amount of VRAM available for model and context, see here. https://github.com/turboderp/exllama/issues/121
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a more popular project.
Related posts
-
HuggingFace hacked – Space secrets leak disclosure
-
Shellgpt: Chat with LLM in your terminal, be it shell generator, story teller
-
Omost: A project to convert LLM's coding capability to image generation
-
Take control! Run ChatGPT and Github Copilot yourself!
-
The DevRel Digest May 2024: Documentation and the Developer Journey