Rapid-MLX
ask
| Rapid-MLX | ask | |
|---|---|---|
| 6 | 1 | |
| 2,756 | 1 | |
| 90.1% | - | |
| 9.8 | - | |
| 4 days ago | about 2 months ago | |
| Python | Go | |
| Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Rapid-MLX
-
Chrome's Gemini Nano Prompt API: A Step-by-Step Guide
💡 💡 Make the fallback cheap to operate. The whole point of using Nano on the supported path is reduced cost. If your fallback is GPT-5.5 at $5/M tokens, you've moved the bill, not deleted it. Two patterns work well: (1) route the fallback to a smaller hosted model (Haiku, Gemini Flash, Mistral Small) that matches Nano's "short summarization" sweet spot; (2) for Mac users specifically, run Rapid-MLX as your /api/llm endpoint — Apple Silicon owners get on-device performance via your server's Mac, not theirs. Same thesis as our DeepClaude guide: the harness is one product, the model is another, and you can swap them.
-
Anthropic is allowing the Claude CLI to run OpenClaw again
> Large-context requests auto-route to a cloud LLM (GPT-5, Claude, etc.) when local prefill would be slow. Routing based on new tokens after cache hit. --cloud-model openai/gpt-5 --cloud-threshold 20000
https://github.com/raullenchai/Rapid-MLX
- Show HN: Rapid-MLX – Run local LLMs on Mac, 2-3x faster than alternatives
-
Gemma 4 on Apple Silicon: 85 tok/s with a pip install
I've verified this end-to-end with structured output (output_type=BaseModel), streaming, multi-turn conversations, and multi-tool workflows. Test suite here.
-
vLLM-mlx – 65 tok/s LLM inference on Mac with tool calling and prompt caching
pip install git+https://github.com/raullenchai/vllm-mlx.git
ask
-
Anthropic is allowing the Claude CLI to run OpenClaw again
You don't really need to tmux at all for Claude Code CLI. Claude Code CLI supports streaming json input, and streaming json output; you can use stdin/out as a pipe to control Claude Code CLI.
I'm doing this today in https://github.com/Cidan/ask -- works great.
What are some alternatives?
Sacred - Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.
openclaw - Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
MindsDB - General-purpose AI designed for knowledge workers — creators, strategists, and operators — and individuals seeking AI systems they can truly control to help them get work done, with full flexibility to extend and deploy anywhere (VPC, on-prem, or cloud).
vibes - A simple mobile-focused chat app to talk to an agent via the ACP protocol
gym - A toolkit for developing and comparing reinforcement learning algorithms.
migrate-openclaw - Migrate your OpenClaw workspace to Claude Code. Converts agents to skills, preserves souls and knowledge bases, generates a clean CLAUDE.md.