Self-Hosting Open Source LLMs: Cross Devices and Local Deployment of Mistral 7B

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • ollama

    Get up and running with Llama 3, Mistral, Gemma, and other large language models.

  • Using this format likely makes distribution a lot easier as they can just use an existing OCI registry for hosting the images, and you can use existing push/pull primitives. OCI images are often used nowadays for things besides docker images if you have a format (like a layered filesystem) that is nicely content-addressable.

    The default registry for Ollama is https://ollama.ai/library (I don't know about the API endpoint). I'm not sure if/how you can actually use it with another registry endpoint.

    When people usually say that something "uses Docker" that usually involves utilizing the Docker daemon with it's sandboxing, networking, etc.. Ollama doesn't do or need that as it can just do inference natively based on GGUF files (+ the other configuration that comes as part of their model file).

    However, as you have noticed, it does borrow heavily from Docker. E.g. apart from the registry mechanism its Modelfile heavily mimics a Dockerfile.

    [0]: https://github.com/jmorganca/ollama/blob/aabd71aede99443d585...

  • https://github.com/second-state/WasmEdge-WASINN-examples/tre... These are the speed of different hardwares. You can rent GPU, which will be much faster

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • wasi-nn

    Neural Network proposal for WASI

  • I really like the post that they mention (https://www.secondstate.io/articles/fast-llm-inference/). The reasons for avoiding python all resonate with me. I'm excited to play with WASI-NN (https://github.com/WebAssembly/wasi-nn) and that rust code is very readable to load up a GGUL model.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Show HN: Rustubble – Beautifull CLI components for your terminal

    1 project | news.ycombinator.com | 5 May 2024
  • Alert for Social Engineering Takeovers of Open Source Projects

    2 projects | news.ycombinator.com | 5 May 2024
  • What is a low/reasonable cost solution for service log storage and querying?

    1 project | news.ycombinator.com | 5 May 2024
  • Verified Rust for low-level systems code

    6 projects | news.ycombinator.com | 4 May 2024
  • Adding search to static websites

    1 project | dev.to | 4 May 2024