-
Does this translate to other models or was whisper cherry picked due to it's serial nature and integer math? looking at https://github.com/ml-explore/mlx-examples/tree/main/stable_... seems to hint that this is the case:
>At the time of writing this comparison convolutions are still some of the least optimized operations in MLX.
I think the main thing at play is the fact you can have 64+G of very fast ram directly coupled to the cpu/gpu and the benefits of that from a latency/co-accessibility point of view.
These numbers are certainly impressive when you look at the power packages of these systems.
Worth considering/noting that the cost of m3 max system with the minimum ram config is ~2x the price of a 4090...
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
How does this compare to insanely-fast-whisper though? https://github.com/Vaibhavs10/insanely-fast-whisper
I think that not using optimizations allows this to be a 1:1 comparison, but if the optimizations are not ported to MLX, then it would still be better to use a 4090.
Having looked at MLX recently, I think it's definitely going to get traction on Macs - and iOS when Swift bindings are released https://github.com/ml-explore/mlx/issues/15 (although there might be some C++20 compilation issue blocking right now).
-
How does this compare to insanely-fast-whisper though? https://github.com/Vaibhavs10/insanely-fast-whisper
I think that not using optimizations allows this to be a 1:1 comparison, but if the optimizations are not ported to MLX, then it would still be better to use a 4090.
Having looked at MLX recently, I think it's definitely going to get traction on Macs - and iOS when Swift bindings are released https://github.com/ml-explore/mlx/issues/15 (although there might be some C++20 compilation issue blocking right now).
-
Could someone elaborate how is this accomplished and is there any quality disparity compared to original whisper?
Repos like https://github.com/SYSTRAN/faster-whisper makes immediate sense about why it's faster than the original, but this one, not so much, especially considering it's even much faster.
-
cog-whisper-diarization
Cog implementation of transcribing + diarization pipeline with Whisper & Pyannote
I'll take this opportunity to ask for help: What's a good open source transcription and diarization app or work flow?
I looked at https://github.com/thomasmol/cog-whisper-diarization and https://about.transcribee.net/ (from the people behind Audapolis) but neither work that well -- crashes, etc.
Thank you!
-
https://github.com/collabora/WhisperLive
The is another one that uses huggingface's implementation, but I haven't tried it since my spec doesn't support flash-att2
-
Related posts
-
I Self-Hosted Llama 3.2 with Coolify on My Home Server: A Step-by-Step Guide
-
Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller
-
I built a free, local video transcription tool, because I didn't want to pay $10/hour or upload my files to a stranger's server
-
Build Real-Time AI Voice Transcription for Web Meetings Fast
-
Nvidia Triton Inference Server