Stretch iPhone to Its Limit: 2GiB Stable Diffusion Model Runs Locally on Device

This page summarizes the projects mentioned and recommended in the original post on

Our great sponsors
  • Onboard AI - Learn any GitHub repo in 59 seconds
  • InfluxDB - Collect and Analyze Billions of Data Points in Real Time
  • SaaSHub - Software Alternatives and Reviews
  • Snappy

    A fast compressor/decompressor

    It doesn't destroy performance for the simple reason that nowadays memory access has higher latency than pure compute. If you need to use compute to produce some data to be stored in memory, your overall throughput could very well be faster than without compression.

    There have been a large amount of innovation on fast compression in recent years. Traditional compression tools like gzip or xz are geared towards higher compression ratio, but memory compression tends to favor speed. Check out those algorithms:

    * lz4:

    * Google's snappy:

    * Facebook's zstd in fast mode:

  • xnu

    iOS uses memory compression but not swap. iOS devices actually have special CPU instructions to speed up compression of up to page size increments specifically to aid in this model [1]


  • Onboard AI

    Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at

  • InvokeAI

    InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.

  • stable-diffusion-webui

    Stable Diffusion web UI

    Automatic1111's stable diffusion web gui works with apple silicon:

  • maple-diffusion

    Stable Diffusion inference on iOS / macOS using MPSGraph

    yeah, running the full decoder takes a while. though, since the "latent" is just 4 channels and pretty close to representing RGB, you can use a linear combination of latent channels and get a basic (grainy, low-res) preview image like this [0] without much trouble. I expect you could go further, and train a shallow conv-only decoder to get nicer preview results, but I'm not sure if anyone's bothered yet.


  • InfluxDB

    Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts