SaaSHub helps you find the best software and product alternatives Learn more →
Llm-awq Alternatives
Similar projects and alternatives to llm-awq
-
FLiPStackWeekly
FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
CodeGen
CodeGen is a family of open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.
-
amazon-bedrock-with-builder-and-command-patterns
A simple, yet powerful implementation in Java that allows developers to write a rather straightforward code to create the API requests for the different foundation models supported by Amazon Bedrock.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
FLaNK-Halifax
Community over Code, Apache NiFi, Apache Kafka, Apache Flink, Python, GTFS, Transit, Open Source, Open Data
-
CoC2023
Community over Code, Apache NiFi, Apache Kafka, Apache Flink, Python, GTFS, Transit, Open Source, Open Data
-
kafka-streams-dashboards
showcases Grafana dashboards for Kafka Stream applications leveraging client JMX metrics.
-
nifiConcurrencyDuration
Search the nifi config for excessive concurrentlySchedulableTaskCount - if desired update to lower value and increase processor duration
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
llm-awq reviews and mentions
-
TinyChat: Large Language Model on the Edge
TinyChat is an efficient, lightweight, Python-native serving framework for 4-bit LLMs by AWQ. It delivers 2.3x generation speed up on RTX4090.
Code: https://github.com/mit-han-lab/llm-awq/tree/main/tinychat
- FLaNK Stack Weekly 23 Oct 2023
-
New base model InternLM 7B weights released, with 8k context window.
I am having trouble finding any 8bit GPTQ models at all, there don't seem to be any on HF it's almost all 4bit with the odd 3bit of the big ones. Suspect I will have to make my own for eval purposes but it's lower priority on my list then finding a 4bit that's GPU friendly but doesn't have such a performance penalty... Looking at AWQ they have 3 and 4bit versions.
-
Llama33B vs Falcon40B vs MPT30B
Using the currently popular gptq the 3bit quantization hurts performance much more than 4bit, but there's also awq (https://github.com/mit-han-lab/llm-awq) and squishllm (https://github.com/SqueezeAILab/SqueezeLLM) which are able to manage 3bit without as much performance drop - I hope to see them used more commonly.
- New hardware-friendly quantization method
-
Activation-Aware Weight Quantization for LLM Compression Outperforms GPTQ
Better quantization would have a direct and meaningful impact for everyone running local LLMs. The technique has already been applied to both Vicuna and the multimodal LLaMA variant LLaVA.
https://github.com/mit-han-lab/llm-awq
-
New quantization method AWQ outperforms GPTQ in 4-bit and 3-bit with 1.45x speedup and works with multimodal LLMs
GitHub: https://github.com/mit-han-lab/llm-awq
-
A note from our sponsor - SaaSHub
www.saashub.com | 30 Apr 2024
Stats
mit-han-lab/llm-awq is an open source project licensed under MIT License which is an OSI approved license.
The primary programming language of llm-awq is Python.
Popular Comparisons
Sponsored