I made an app that runs Mistral 7B 0.2 LLM locally on iPhone Pros

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Judoscale - Save 47% on cloud hosting with autoscaling that just works
Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
judoscale.com
featured
InfluxDB high-performance time series database
Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
influxdata.com
featured
  1. Cgml

    GPU-targeted vendor-agnostic AI library for Windows, and Mistral model implementation.

    Is that explanation better? https://github.com/Const-me/Cgml/blob/master/Mistral/Mistral...

    Same Mistral Instruct 0.2 model, different implementation.

  2. Judoscale

    Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.

    Judoscale logo
  3. llama.cpp

    LLM inference in C/C++

    We should be happy that compute is once again improving and machines are getting outdated rapidly. Which is better - a world where your laptop is competitive for 5+ years but everything stays the same? Or one where entire new realms of advancement open up every 18 months?

    It’s a no contest option 2 for me.

    Just use llama.cpp with any of the available UIs. It will be usable with 4 but quantization on CPU. You can use any of the “Q4_M” “GGUF” models that TheBloke puts out on Huggingface.

    https://github.com/ggerganov/llama.cpp

    UI projects in description.

    https://huggingface.co/TheBloke

    A closed source option is LMStudio.

    https://lmstudio.ai/

  4. exporters

    Export Hugging Face models to Core ML and TensorFlow Lite

  5. enchanted

    Enchanted is iOS and macOS app for chatting with private self hosted language models such as Llama2, Mistral or Vicuna using Ollama.

  6. llama_index

    LlamaIndex is the leading framework for building LLM-powered agents over your data.

    Mistral Instruct does use a system prompt.

    You can see the raw format here: https://www.promptingguide.ai/models/mistral-7b#chat-templat... and you can see how LllamaIndex uses it here (as an example): https://github.com/run-llama/llama_index/blob/1d861a9440cdc9...

  7. swift-transformers

    Swift Package to implement a transformers-like API in Swift

  8. mlx

    MLX: An array framework for Apple silicon

  9. InfluxDB

    InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.

    InfluxDB logo
  10. ollama

    Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.

    ollamma https://ollama.ai/ is popular choice for running local llm models and should work fine on intel. It's just wrapping docker so shouldn't require m2/m3.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Machine Learning Summer Schools

    1 project | news.ycombinator.com | 10 Apr 2025
  • 🐉 Loong: Synthesize Long CoTs at Scale through Verifiers

    2 projects | dev.to | 9 Apr 2025
  • Scaling Environments for Agents

    1 project | dev.to | 7 Apr 2025
  • How to build your own MCP servers

    2 projects | dev.to | 5 Apr 2025
  • CAMEL DatabaseAgent: An Open-Source Solution for Converting Complex Data Queries into Natural Conversations

    2 projects | dev.to | 3 Apr 2025

Did you know that Python is
the 2nd most popular programming language
based on number of references?