Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free. Learn more →
Top 23 Python Whisper Projects
-
Project mention: Play 3.0 mini – A lightweight, reliable, cost-efficient Multilingual TTS model | news.ycombinator.com | 2024-10-14
Hi, I don't know what's SOTA, but I got good results with these (open source, on-device) :
https://github.com/SYSTRAN/faster-whisper (speech-to-text)
-
Nutrient
Nutrient - The #1 PDF SDK Library. Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free.
-
Yes it's still relevant but I prefer WhisperX for some tasks: https://github.com/m-bain/whisperX
-
buzz
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Project mention: Buzz: Transcribe and translate audio offline on your personal computer | news.ycombinator.com | 2024-03-21 -
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
-
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Project mention: Omni SenseVoice: High-Speed Speech Recognition with Words Timestamps | news.ycombinator.com | 2024-10-12Apparently not. See https://github.com/lifeiteng/OmniSenseVoice/blob/main/src/om.... See also FunASR running SenseVoice but using Kaldi for speaker identification https://github.com/modelscope/FunASR/blob/cd684580991661b9a0...
-
inference
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
-
nexa-sdk
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
-
distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
Details will be shared tomorrow, but from what I have read they have distilled the large model decoder into this turbo that only has 4 layers instead of 32, the encoder should remain the same size. Similar to https://github.com/huggingface/distil-whisper but the model is distilled using multilingual data instead of just English, and the decoder is 4 layers instead of 2.
-
chatgpt-telegram-bot
🤖 A Telegram bot that integrates with OpenAI's official ChatGPT APIs to provide answers, written in Python (by n3d1117)
-
voice-pro
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer(RVC), zero-shot Voice Cloning (E2, F5-TTS), YouTube downloading, vocal isolation(UVR5), Text-to-Speech (Edge-TTS), and multi-language translation. Perfect for content creators and developers.
Project mention: Voice-Pro: Ultimate AI Voice Conversion and Multilingual Translation Tool 🔊 | dev.to | 2025-02-10GitHub: https://github.com/abus-aikorea/voice-pro
-
Project mention: I Self-Hosted Llama 3.2 with Coolify on My Home Server: A Step-by-Step Guide | news.ycombinator.com | 2024-10-16
Looks very nice, saved it, will try to give it a try later. What I've worked on last week was always-on speech to text in order to have one more input for my automations. Got to the point where it's pretty accurate but I imposed some made-up constraints to write some parts of it from scratch to get a single binary that I can deploy I still have some work to do (never did audio processing before :D) but I'm optimistic.
The easier way would be to spin something like https://github.com/collabora/WhisperLive/ in a docker, open a websocket and pass it to the LLM, I could see that as a feature in your product.
-
whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Project mention: Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old | news.ycombinator.com | 2024-02-28Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]
[0] https://github.com/linto-ai/whisper-timestamped
-
-
-
whisper-standalone-win
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
On Windows I use whisper-standalone-win: https://github.com/Purfview/whisper-standalone-win
It has a few customization features that are nice: https://github.com/Purfview/whisper-standalone-win/discussio...
Works miles better than plain faster-whisper, in my experience. Not sure if there's wildcard support but that's easily scripted.
-
subsai
🎞️ Subtitles generation tool (Web-UI + CLI + Python package) powered by OpenAI's Whisper and its variants 🎞️
-
-
Project mention: Show HN: Mikey – No bot meeting notetaker for Windows | news.ycombinator.com | 2025-02-12
https://github.com/m-bain/whisperX looks promising - I'm hacking away on an always-on transcriber for my notes for later search&recall. It has support for diarization (the speaker detection you're looking for).
I'm currently hacking away on a mix of https://github.com/speaches-ai/speaches + https://github.com/ufal/whisper_streaming though - mostly because my laptop doesn't have a decent GPU, I stream the audio to a home server instead.
But overall it's pretty simple to do after you wrangle the Python dependencies - all you need is a sink for the text files (for example, create a new file for every Teams meeting, but that's another story...)
-
whisper-ctranslate2
Whisper command line client compatible with original OpenAI client based on CTranslate2.
-
-
-
whisper.api
This project provides an API with user level access support to transcribe speech to text using a finetuned and processed Whisper ASR model.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Whisper discussion
Python Whisper related posts
-
Show HN: Mikey – No bot meeting notetaker for Windows
-
Ask HN: Is Whisper Still Relevant?
-
Voice-Pro: Ultimate AI Voice Conversion and Multilingual Translation Tool 🔊
-
Show HN: Eleven Labs Alternative – Voice Cloning with RVC and Multilingual TTS
-
Show HN: Voice-Pro – Now More Powerful and Easier to Use
-
Transcriber AI – Free, end-to-end machine based transcription with speaker id
-
Show HN: Voice-Pro – A Comprehensive Gradio WebUI for Advanced Audio Processing
-
A note from our sponsor - Nutrient
nutrient.io | 18 Feb 2025
Index
What are some of the best open-source Whisper projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | faster-whisper | 14,037 |
2 | whisperX | 13,857 |
3 | buzz | 13,514 |
4 | PaddleSpeech | 11,500 |
5 | FunASR | 8,075 |
6 | inference | 6,273 |
7 | nexa-sdk | 4,345 |
8 | wenet | 4,299 |
9 | distil-whisper | 3,729 |
10 | chatgpt-telegram-bot | 3,187 |
11 | voice-pro | 3,173 |
12 | WhisperLive | 2,418 |
13 | whisper-timestamped | 2,228 |
14 | auto-subtitle | 1,762 |
15 | Whisper-WebUI | 1,660 |
16 | whisper-standalone-win | 1,631 |
17 | subsai | 1,383 |
18 | yt-whisper | 1,378 |
19 | speaches | 1,332 |
20 | whisper-ctranslate2 | 981 |
21 | truss | 946 |
22 | AI-Waifu-Vtuber | 900 |
23 | whisper.api | 871 |