Run Mistral 7B on M1 Mac

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • llamafile

    Distribute and run LLMs with a single file.

  • Cgml

    GPU-targeted vendor-agnostic AI library for Windows, and Mistral model implementation.

  • Windows equivalent: https://github.com/Const-me/Cgml/tree/master/Mistral/Mistral...

    Runs on GPUs, uses about 5GB VRAM. On integrated GPUs generates 1-2 tokens/second, on discrete ones often over 20 tokens/second.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • ollama-webui

    Discontinued ChatGPT-Style WebUI for LLMs (Formerly Ollama WebUI) [Moved to: https://github.com/open-webui/open-webui]

  • ollama

    Get up and running with Llama 3, Mistral, Gemma, and other large language models.

  • https://ollama.ai/

    Very surprised no one else has said it.

    If you prefer web UI:

  • llama.cpp

    LLM inference in C/C++

  • One thing that's worth mentioning about llama.cpp wrappers like ollama, LM Studio and Faraday is that they don't yet support[1] sliding window attention, and instead use vanilla causal attention from llama2. As noted in the Mistral 7B paper[2], SWA has some benefits in terms of attention span over regular causal attention.

    [1]: https://github.com/ggerganov/llama.cpp/issues/3377

  • OmniQuant

    [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

  • Not on iOS. On macOS, I personally think WizardLM 13B v1.2 is a very strong model and keep hearing good things about it from users on our discord and in support emails. Now that there's OmniQuant support for Mixtral models[1], I'm plan to add support for Mixtral-8x7B-Instruct-v0.1 in the next version of the macOS app, which in my tests, looks like a very good all purpose model that's also pretty good at coding. It's pretty memory hungry (~41GB of RAM), but that's the price to pay for an uncompromising implementation. Existing quantized implementations quantize the MoE gates, leading to a significant drop in perplexity when compared with results from fp16 inference.

    [1]: https://github.com/OpenGVLab/OmniQuant/commit/798467

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • OpenAI's New Strategy

    2 projects | /r/ChatGPTPro | 9 Dec 2023
  • Ollama is INSANE - Install custom GPTs within seconds! [Video Tutorial]

    1 project | /r/chatgpt_newtech | 16 Nov 2023
  • Ollama-Webui: ChatGPT-Style Responsive Chat Web UI Client (GUI) for Ollama

    1 project | news.ycombinator.com | 11 Nov 2023
  • Run Large and Small Language Models locally with ollama

    2 projects | dev.to | 7 May 2024
  • Run copilot locally

    3 projects | dev.to | 15 Apr 2024