-
-
Judoscale
Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
-
-
-
While llama3-8b might be slightly more brittle under quantization, llama3-70b really surprised myself and others[1] in how well it performs even in the 2..3 bits per parameter regime. It requires one of the most advanced quantization methods (IQ2_XS specifically) but the reward is a SoTA LLM that fits on one 4090 GPU and allows for advanced usecases such as powering the agent engine I'm working on: https://github.com/kir-gadjello/picoagent-rnd
For me it completely replaced strong models such as Mixtral-8x7B and DeepSeek-Coder-Instruct-33B.
1. https://www.reddit.com/r/LocalLLaMA/comments/1cst400/result_...
-
Yes but please try to avoid repetition on HN. The GP's response was rude and broke the site guidelines but https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... does look excessive to me.