-
Does this translate to other models or was whisper cherry picked due to it's serial nature and integer math? looking at https://github.com/ml-explore/mlx-examples/tree/main/stable_... seems to hint that this is the case:
>At the time of writing this comparison convolutions are still some of the least optimized operations in MLX.
I think the main thing at play is the fact you can have 64+G of very fast ram directly coupled to the cpu/gpu and the benefits of that from a latency/co-accessibility point of view.
These numbers are certainly impressive when you look at the power packages of these systems.
Worth considering/noting that the cost of m3 max system with the minimum ram config is ~2x the price of a 4090...
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
How does this compare to insanely-fast-whisper though? https://github.com/Vaibhavs10/insanely-fast-whisper
I think that not using optimizations allows this to be a 1:1 comparison, but if the optimizations are not ported to MLX, then it would still be better to use a 4090.
Having looked at MLX recently, I think it's definitely going to get traction on Macs - and iOS when Swift bindings are released https://github.com/ml-explore/mlx/issues/15 (although there might be some C++20 compilation issue blocking right now).
-
How does this compare to insanely-fast-whisper though? https://github.com/Vaibhavs10/insanely-fast-whisper
I think that not using optimizations allows this to be a 1:1 comparison, but if the optimizations are not ported to MLX, then it would still be better to use a 4090.
Having looked at MLX recently, I think it's definitely going to get traction on Macs - and iOS when Swift bindings are released https://github.com/ml-explore/mlx/issues/15 (although there might be some C++20 compilation issue blocking right now).
-
Could someone elaborate how is this accomplished and is there any quality disparity compared to original whisper?
Repos like https://github.com/SYSTRAN/faster-whisper makes immediate sense about why it's faster than the original, but this one, not so much, especially considering it's even much faster.
-
cog-whisper-diarization
Cog implementation of transcribing + diarization pipeline with Whisper & Pyannote
I'll take this opportunity to ask for help: What's a good open source transcription and diarization app or work flow?
I looked at https://github.com/thomasmol/cog-whisper-diarization and https://about.transcribee.net/ (from the people behind Audapolis) but neither work that well -- crashes, etc.
Thank you!
-
https://github.com/collabora/WhisperLive
The is another one that uses huggingface's implementation, but I haven't tried it since my spec doesn't support flash-att2
-
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
Related posts
-
I Self-Hosted Llama 3.2 with Coolify on My Home Server: A Step-by-Step Guide
-
Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller
-
Whispercpp – Local, Fast, and Private Audio Transcription for Ruby
-
Intel Geti and Intel Geti SDK are open-source
-
Wav2Lip: Accurately Lip-Syncing Videos and OpenVINO