Rapid-MLX
prompt-api
| Rapid-MLX | prompt-api | |
|---|---|---|
| 6 | 14 | |
| 2,756 | 747 | |
| 90.1% | 5.1% | |
| 9.8 | 6.3 | |
| 4 days ago | 2 months ago | |
| Python | Bikeshed | |
| Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Rapid-MLX
-
Chrome's Gemini Nano Prompt API: A Step-by-Step Guide
💡 💡 Make the fallback cheap to operate. The whole point of using Nano on the supported path is reduced cost. If your fallback is GPT-5.5 at $5/M tokens, you've moved the bill, not deleted it. Two patterns work well: (1) route the fallback to a smaller hosted model (Haiku, Gemini Flash, Mistral Small) that matches Nano's "short summarization" sweet spot; (2) for Mac users specifically, run Rapid-MLX as your /api/llm endpoint — Apple Silicon owners get on-device performance via your server's Mac, not theirs. Same thesis as our DeepClaude guide: the harness is one product, the model is another, and you can swap them.
-
Anthropic is allowing the Claude CLI to run OpenClaw again
> Large-context requests auto-route to a cloud LLM (GPT-5, Claude, etc.) when local prefill would be slow. Routing based on new tokens after cache hit. --cloud-model openai/gpt-5 --cloud-threshold 20000
https://github.com/raullenchai/Rapid-MLX
- Show HN: Rapid-MLX – Run local LLMs on Mac, 2-3x faster than alternatives
-
Gemma 4 on Apple Silicon: 85 tok/s with a pip install
I've verified this end-to-end with structured output (output_type=BaseModel), streaming, multi-turn conversations, and multi-tool workflows. Test suite here.
-
vLLM-mlx – 65 tok/s LLM inference on Mac with tool calling and prompt caching
pip install git+https://github.com/raullenchai/vllm-mlx.git
prompt-api
-
Chrome's Gemini Nano Prompt API: A Step-by-Step Guide
The technical name for this is the Prompt API (official spec, Chrome docs). It's been in Chrome's bowels since version 138 — initially behind a flag, now also available as an Origin Trial for production sites. The big news is that a critical mass of developers just figured out it's there, and the demos hitting HN every week (Decaf rewriting comments, Subtitle Insights translating YouTube live, the side-panel UI above) are no longer "look what's possible" — they're "I shipped it last weekend."
- Mozilla's Opposition to Chrome's Prompt API
- Show HN: Apfel – The free AI on your Mac
-
Swift on Android: Full Native App Development Now Possible
I guess this is the Dunning-Kruger effect everyone talks about!
To understand just enough to regurgitate what happened, but miss why it happened... and then assume someone who's pointing at the much more relevant why is just plain wrong.
Because the why requires actually understanding of things like developer mindshare rather than regurgitating search results.
-
The hint I'll leave if you're willing to consider maybe you don't know everything ever... look at who's feedback is being promoted when Chrome wants to do obviously unpopular things on the web: https://github.com/webmachinelearning/prompt-api/blob/main/R...
https://github.com/mozilla/standards-positions/issues/1213
And model for yourself what happens if the developer interest exceeds vendor refusal in magnitude, so Google just ships the thing, without a feature flag, to a massive percentage of the web-going world.
-
WebMCP
https://github.com/webmachinelearning/prompt-api
https://developer.chrome.com/docs/ai/built-in :
> Standardization effort: We're working to standardize all of these APIs for cross-browser compatibility.
> The Language Detector API and Translator API have been adopted by the W3C WebML Working Group. We've asked Mozilla and WebKit for their standards positions.
> The Summarizer API, Writer API, and Rewriter API have also been adopted by the W3C WebML Working Group. We've asked asked Mozilla and WebKit for their standards positions.
webmachinelearning/webmcp:
-
Show HN: Grammit – Local-only AI grammar checker (Chrome extension)
I don't know if Vivaldi supports the new Prompt API [0] that Grammit uses to run the local LLM.
As far as I know, the only browsers supporting it currently are Chrome [1] and Edge [2].
[0] https://github.com/webmachinelearning/prompt-api
[1] https://developer.chrome.com/docs/extensions/ai/prompt-api
[2] https://learn.microsoft.com/en-us/microsoft-edge/web-platfor...
-
WebGPU enables local LLM in the browser. Demo site with AI chat
There is a Prompt API in development that's available in both Chrome and Edge to give pages to a local LLM. Chrome extensions have access to it and I believe websites can request access as part of an origin trial.
The model is fully managed by the browser. It's currently the Gemini Nano model on Chrome, and they are testing a version of the Gemma 3n model in beta channels. Edge uses phi-4-mini.
You can learn more here: https://github.com/webmachinelearning/prompt-api
- Explainer for the Prompt API: design sketch by the Chrome built-in AI team
-
Orbit. Mozilla's AI Assistant for Firefox
I wish they would copy this one https://github.com/webmachinelearning/prompt-api and include a few options for small self-hosted LLMs using WebGPU or some built-in accelerated AI for Firefox.
- Explainer for the Prompt API
What are some alternatives?
Sacred - Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.
RAGatouille - Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
MindsDB - General-purpose AI designed for knowledge workers — creators, strategists, and operators — and individuals seeking AI systems they can truly control to help them get work done, with full flexibility to extend and deploy anywhere (VPC, on-prem, or cloud).
translation-api - 🌏 A proposal for translator and language detector APIs
gym - A toolkit for developing and comparing reinforcement learning algorithms.
llama-explain-extension