mmdeploy
whisper.cpp
mmdeploy | whisper.cpp | |
---|---|---|
4 | 201 | |
3,023 | 42,817 | |
1.6% | 4.0% | |
4.7 | 9.9 | |
11 months ago | 8 days ago | |
Python | C++ | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mmdeploy
- [D] Object detection models that can be easily converted to CoreML
-
Orange Pi 5 Plus Koboldcpp Demo (MPT, Falcon, Mini-Orca, Openllama)
The RK3588 also has a NPU for accelerating neural networks. The bad news is the API is not supported by any of the inference engines (afaik), but the NPU can run models directly that have been converted to the RKNN format. It is a long shot, but you can find details here.
-
MMDeploy: Deploy All the Algorithms of OpenMMLab
BibTeX @misc{=mmdeploy, title={OpenMMLab's Model Deployment Toolbox.}, author={MMDeploy Contributors}, howpublished = {\url{https://github.com/open-mmlab/mmdeploy}}, year={2021} }
-
Removing the bounding box generated by OnnxRuntime segmentation
I have a semantic segmentation model trained using the mmdetection repo. Then it is converted to the ONNX format using the mmdeploy repo.
whisper.cpp
-
Show HN: OWhisper β Ollama for realtime speech-to-text
Thank you for taking the time to build something and share it. However what is the advantage of using this over whisper.cpp stream that can also do real time conversion?
https://github.com/ggml-org/whisper.cpp/tree/master/examples...
-
Kitten TTS: 25MB CPU-Only, Open-Source Voice Model
Whisper and the many variants. Here's a good implementation.
https://github.com/ggml-org/whisper.cpp
-
Ask HN: What API or software are people using for transcription?
Whisper large v3 from openai, but we host it ourselves on Modal.com. It's easy, fast, no rate limits, and cheap as well.
If you want to run it locally, I'd still go with whisper, then I'd look at something like whisper.cpp https://github.com/ggml-org/whisper.cpp. Runs quite well.
- Whispercpp β Local, Fast, and Private Audio Transcription for Ruby
-
Build Your Own Siri. Locally. On-Device. No Cloud
not the gp but found this https://github.com/ggml-org/whisper.cpp/blob/master/models/c...
-
Run LLMs on Apple Neural Engine (ANE)
Actually that's a really good question, I hadn't considered that the comparison here is just CPU vs using Metal (CPU+GPU).
To answer the question though - I think this would be used for cases where you are building an app that wants to utilize a small AI model while at the same time having the GPU free to do graphics related things, which I'm guessing is why Apple stuck these into their hardware in the first place.
Here is an interesting comparison between the two from a whisper.cpp thread - ignoring startup times - the CPU+ANE seems about on par with CPU+GPU: https://github.com/ggml-org/whisper.cpp/pull/566#issuecommen...
-
Building a personal, private AI computer on a budget
A great thread with the type of info your looking for lives here: https://github.com/ggerganov/whisper.cpp/issues/89
But you can likely find similar threads for the llama.cpp benchmark here: https://github.com/ggerganov/llama.cpp/tree/master/examples/...
These are good examples because the llama.cpp and whisper.cpp benchmarks take full advantage of the Apple hardware but also take full advantage of non-Apple hardware with GPU support, AVX support etc.
Itβs been true for a while now that the memory bandwidth of modern Apple systems in tandem with the neural cores and gpu has made them very competitive Nvidia for local inference and even training.
- Whisper.cpp: Looking for Maintainers
- Show HN: Galene-stt: automatic captioning for the Galene videconferencing system
-
Show HN: Transcribe YouTube Videos
Not as convenient, but you could also have the user manually install the model, like whisper does.
Just forward the error message output by whisper, or even make a more user-friendly error message with instructions on how/where to download the models.
Whisper does provide a simple bash script to download models: https://github.com/ggerganov/whisper.cpp/blob/master/models/...
(As a Windows user, I can run bash scripts via Git Bash for Windows[1])
[1]: https://git-scm.com/download/win
What are some alternatives?
FastDeploy - High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
bark - π Text-Prompted Generative Audio Model
mmfewshot - OpenMMLab FewShot Learning Toolbox and Benchmark
faster-whisper - Faster Whisper transcription with CTranslate2
mmselfsup - OpenMMLab Self-Supervised Learning Toolbox and Benchmark
whisper - Robust Speech Recognition via Large-Scale Weak Supervision