Ask HN: Cheapest way to run local LLMs?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • ollama

    Get up and running with Llama 3, Mistral, Gemma, and other large language models.

  • Hm, it is unclear to me if you plan to use some PIs or your Mac M1.

    In case it's the latter, I recently used Ollama[1] and boy was it good! Installation was a breeze, using models is super easy and performance on my M1 was really good for the Mistral 7B model.

    1: https://ollama.ai/

  • gpu_poor

    Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization

  • Here's a simple calculator for LLM inference requirements: https://rahulschand.github.io/gpu_poor/

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • outlines

    Structured Text Generation

  • One of the most powerful ways to integrate LLMs with existing systems is constrained generation. Libraries such as outlines[1] and instructor[2] allow structural specification of the expected outputs as regex patterns, simple types, jsonschema or pydantic models.

    These outputs often consume significantly fewer tokens than chat or text completion.

    [1] https://github.com/outlines-dev/outlines

    [2] https://github.com/jxnl/instructor

  • instructor

    structured outputs for llms

  • One of the most powerful ways to integrate LLMs with existing systems is constrained generation. Libraries such as outlines[1] and instructor[2] allow structural specification of the expected outputs as regex patterns, simple types, jsonschema or pydantic models.

    These outputs often consume significantly fewer tokens than chat or text completion.

    [1] https://github.com/outlines-dev/outlines

    [2] https://github.com/jxnl/instructor

  • swiss_army_llama

    A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.

  • Depends what you mean by "local". If you mean in your own home, then there isn't a particularly cheap way unless you have a decent spare machine. If you mean "I get to control everything myself" then you can rent a cheap VPS on a value host like Contabo (you can get 8cores, 30gb of ram, and 1tb SSD on Ubuntu 22.04 for something like $35/month-- just stick the to US data centers).

    Then if you want something that is extremely quick and easy to set up and provides a convenient REST api for completions/embeddings with some other nice features, you might want to check out my project here:

    https://github.com/Dicklesworthstone/swiss_army_llama

    Especially if you use Docker to set it up, you can go from a brand new box to a working setup in under 20 minutes and then access it via the Swagger page from any browser.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts