Python Whisper

Open-source Python projects categorized as Whisper

Top 23 Python Whisper Projects

  1. faster-whisper

    Faster Whisper transcription with CTranslate2

    Project mention: Play 3.0 mini – A lightweight, reliable, cost-efficient Multilingual TTS model | news.ycombinator.com | 2024-10-14

    Hi, I don't know what's SOTA, but I got good results with these (open source, on-device) :

    https://github.com/SYSTRAN/faster-whisper (speech-to-text)

  2. Nutrient

    Nutrient - The #1 PDF SDK Library. Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free.

    Nutrient logo
  3. whisperX

    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

    Project mention: Ask HN: Is Whisper Still Relevant? | news.ycombinator.com | 2025-02-12

    Yes it's still relevant but I prefer WhisperX for some tasks: https://github.com/m-bain/whisperX

  4. buzz

    Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.

    Project mention: Buzz: Transcribe and translate audio offline on your personal computer | news.ycombinator.com | 2024-03-21
  5. PaddleSpeech

    Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

  6. FunASR

    A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

    Project mention: Omni SenseVoice: High-Speed Speech Recognition with Words Timestamps | news.ycombinator.com | 2024-10-12

    Apparently not. See https://github.com/lifeiteng/OmniSenseVoice/blob/main/src/om.... See also FunASR running SenseVoice but using Kaldi for speaker identification https://github.com/modelscope/FunASR/blob/cd684580991661b9a0...

  7. inference

    Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

  8. nexa-sdk

    Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.

    Project mention: Benchmark GGUF models with a one line of code | news.ycombinator.com | 2024-11-01
  9. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  10. wenet

    Production First and Production Ready End-to-End Speech Recognition Toolkit

  11. distil-whisper

    Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

    Project mention: New OpenAI Whisper model: "turbo" | news.ycombinator.com | 2024-09-30

    Details will be shared tomorrow, but from what I have read they have distilled the large model decoder into this turbo that only has 4 layers instead of 32, the encoder should remain the same size. Similar to https://github.com/huggingface/distil-whisper but the model is distilled using multilingual data instead of just English, and the decoder is 4 layers instead of 2.

  12. chatgpt-telegram-bot

    🤖 A Telegram bot that integrates with OpenAI's official ChatGPT APIs to provide answers, written in Python (by n3d1117)

  13. voice-pro

    Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer(RVC), zero-shot Voice Cloning (E2, F5-TTS), YouTube downloading, vocal isolation(UVR5), Text-to-Speech (Edge-TTS), and multi-language translation. Perfect for content creators and developers.

    Project mention: Voice-Pro: Ultimate AI Voice Conversion and Multilingual Translation Tool 🔊 | dev.to | 2025-02-10

    GitHub: https://github.com/abus-aikorea/voice-pro

  14. WhisperLive

    A nearly-live implementation of OpenAI's Whisper.

    Project mention: I Self-Hosted Llama 3.2 with Coolify on My Home Server: A Step-by-Step Guide | news.ycombinator.com | 2024-10-16

    Looks very nice, saved it, will try to give it a try later. What I've worked on last week was always-on speech to text in order to have one more input for my automations. Got to the point where it's pretty accurate but I imposed some made-up constraints to write some parts of it from scratch to get a single binary that I can deploy I still have some work to do (never did audio processing before :D) but I'm optimistic.

    The easier way would be to spin something like https://github.com/collabora/WhisperLive/ in a docker, open a websocket and pass it to the LLM, I could see that as a feature in your product.

  15. whisper-timestamped

    Multilingual Automatic Speech Recognition with word-level timestamps and confidence

    Project mention: Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old | news.ycombinator.com | 2024-02-28

    Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]

    [0] https://github.com/linto-ai/whisper-timestamped

  16. auto-subtitle

    Automatically generate and overlay subtitles for any video.

  17. Whisper-WebUI

    A Web UI for easy subtitle using whisper model.

    Project mention: Whisper-WebUI | news.ycombinator.com | 2024-08-21
  18. whisper-standalone-win

    Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.

    Project mention: Whisper-WebUI | news.ycombinator.com | 2024-08-21

    On Windows I use whisper-standalone-win: https://github.com/Purfview/whisper-standalone-win

    It has a few customization features that are nice: https://github.com/Purfview/whisper-standalone-win/discussio...

    Works miles better than plain faster-whisper, in my experience. Not sure if there's wildcard support but that's easily scripted.

  19. subsai

    🎞️ Subtitles generation tool (Web-UI + CLI + Python package) powered by OpenAI's Whisper and its variants 🎞️

  20. yt-whisper

    Using OpenAI's Whisper to automatically generate YouTube subtitles

  21. speaches

    Project mention: Show HN: Mikey – No bot meeting notetaker for Windows | news.ycombinator.com | 2025-02-12

    https://github.com/m-bain/whisperX looks promising - I'm hacking away on an always-on transcriber for my notes for later search&recall. It has support for diarization (the speaker detection you're looking for).

    I'm currently hacking away on a mix of https://github.com/speaches-ai/speaches + https://github.com/ufal/whisper_streaming though - mostly because my laptop doesn't have a decent GPU, I stream the audio to a home server instead.

    But overall it's pretty simple to do after you wrangle the Python dependencies - all you need is a sink for the text files (for example, create a new file for every Teams meeting, but that's another story...)

  22. whisper-ctranslate2

    Whisper command line client compatible with original OpenAI client based on CTranslate2.

  23. truss

    The simplest way to serve AI/ML models in production (by basetenlabs)

  24. AI-Waifu-Vtuber

    AI Vtuber for Streaming on Youtube/Twitch

  25. whisper.api

    This project provides an API with user level access support to transcribe speech to text using a finetuned and processed Whisper ASR model.

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Whisper discussion

Log in or Post with

Python Whisper related posts

  • Show HN: Mikey – No bot meeting notetaker for Windows

    6 projects | news.ycombinator.com | 12 Feb 2025
  • Ask HN: Is Whisper Still Relevant?

    2 projects | news.ycombinator.com | 12 Feb 2025
  • Voice-Pro: Ultimate AI Voice Conversion and Multilingual Translation Tool 🔊

    1 project | dev.to | 10 Feb 2025
  • Show HN: Eleven Labs Alternative – Voice Cloning with RVC and Multilingual TTS

    1 project | news.ycombinator.com | 25 Jan 2025
  • Show HN: Voice-Pro – Now More Powerful and Easier to Use

    1 project | news.ycombinator.com | 23 Jan 2025
  • Transcriber AI – Free, end-to-end machine based transcription with speaker id

    1 project | news.ycombinator.com | 16 Dec 2024
  • Show HN: Voice-Pro – A Comprehensive Gradio WebUI for Advanced Audio Processing

    1 project | news.ycombinator.com | 26 Nov 2024
  • A note from our sponsor - Nutrient
    nutrient.io | 18 Feb 2025
    Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free. Learn more →

Index

What are some of the best open-source Whisper projects in Python? This list will help you:

# Project Stars
1 faster-whisper 14,037
2 whisperX 13,857
3 buzz 13,514
4 PaddleSpeech 11,500
5 FunASR 8,075
6 inference 6,273
7 nexa-sdk 4,345
8 wenet 4,299
9 distil-whisper 3,729
10 chatgpt-telegram-bot 3,187
11 voice-pro 3,173
12 WhisperLive 2,418
13 whisper-timestamped 2,228
14 auto-subtitle 1,762
15 Whisper-WebUI 1,660
16 whisper-standalone-win 1,631
17 subsai 1,383
18 yt-whisper 1,378
19 speaches 1,332
20 whisper-ctranslate2 981
21 truss 946
22 AI-Waifu-Vtuber 900
23 whisper.api 871

Sponsored
Nutrient - The #1 PDF SDK Library
Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free.
nutrient.io

Did you know that Python is
the 2nd most popular programming language
based on number of references?