Python speech-to-text

Open-source Python projects categorized as speech-to-text

Top 23 Python speech-to-text Projects

speech-to-text
  1. faster-whisper

    Faster Whisper transcription with CTranslate2

    Project mention: Play 3.0 mini – A lightweight, reliable, cost-efficient Multilingual TTS model | news.ycombinator.com | 2024-10-14

    Hi, I don't know what's SOTA, but I got good results with these (open source, on-device) :

    https://github.com/SYSTRAN/faster-whisper (speech-to-text)

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. whisperX

    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

    Project mention: Ask HN: Is Whisper Still Relevant? | news.ycombinator.com | 2025-02-12

    Yes it's still relevant but I prefer WhisperX for some tasks: https://github.com/m-bain/whisperX

  4. pyvideotrans

    Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,同时支持语音识别转录、语音合成、字幕翻译。

    Project mention: Pyvideotrans – AI Video Translation and Voiceover Tool | news.ycombinator.com | 2024-08-14
  5. speechbrain

    A PyTorch-based Speech Toolkit

    Project mention: Speaker Diarization in Python | dev.to | 2024-08-22

    Simple Diarizer Simple Diarizer is a speaker diarization library that utilizes pretrained models from SpeechBrain . To get started with simple_diarizer, follow these steps:

  6. SpeechRecognition

    Speech recognition module for Python, supporting several engines and APIs, online and offline.

  7. RealtimeSTT

    A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.

    Project mention: RealTime STT | news.ycombinator.com | 2024-10-09
  8. SenseVoice

    Multilingual Voice Understanding Model

    Project mention: Omni SenseVoice: High-Speed Speech Recognition with Words Timestamps | news.ycombinator.com | 2024-10-12

    I mean they make a bold statement up top just to paddle back a little bit further down with: "[…] In terms of Chinese and Cantonese recognition, the SenseVoice-Small model has advantages."

    It feels dishonest to me.

    [0] https://github.com/FunAudioLLM/SenseVoice?tab=readme-ov-file...

  9. Nutrient

    Nutrient - The #1 PDF SDK Library. Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free.

    Nutrient logo
  10. voice-pro

    Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer(RVC), zero-shot Voice Cloning (E2, F5-TTS), YouTube downloading, vocal isolation(UVR5), Text-to-Speech (Edge-TTS), and multi-language translation. Perfect for content creators and developers.

    Project mention: Voice-Pro: Ultimate AI Voice Conversion and Multilingual Translation Tool 🔊 | dev.to | 2025-02-10

    GitHub: https://github.com/abus-aikorea/voice-pro

  11. lingvo

    Lingvo

  12. LLaMA-Omni

    LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

    Project mention: Hertz-dev, the first open-source base model for conversational audio | news.ycombinator.com | 2024-11-03

    - [LLaMA-Omni](https://github.com/ictnlp/LLaMA-Omni) is a speech-language model built on Llama-3.1-8B-Instruct and trained using just 4 GPUs, offering low-latency, high-quality speech interactions and simultaneous generation of text and speech responses

    - [moshi](https://github.com/kyutai-labs/moshi) a speech-text foundation model that supports low-latency high-quality speech interactions and simultaneous generation of text responses, using Mimi, a SOTA streaming neural audio codec

    - [Mini-Omni](https://github.com/gpt-omni/mini-omni) a multimodal LLM based on Qwen2 offering real-time end-to-end speech input and streaming audio output conversational capabilities

    - [Aria](https://github.com/rhymes-ai/Aria) is a lightweight, multimodal native MoE model with 25B parameters and 3.9B activated parameters per token, offering state-of-the-art performance in multimodal, language, and coding tasks, with a long multimodal context window of 64K tokens and efficient encoding of visual input for fast inference and low fine-tuning cost

    - [Ichigo](https://github.com/homebrewltd/ichigo) an open research project extending a text-based LLM to have native listening ability, using an early fusion technique, with improved multiturn capabilities and refusal to process inaudible queries

  13. whisper-asr-webservice

    OpenAI Whisper ASR Webservice API

  14. whisper-timestamped

    Multilingual Automatic Speech Recognition with word-level timestamps and confidence

    Project mention: Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old | news.ycombinator.com | 2024-02-28

    Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]

    [0] https://github.com/linto-ai/whisper-timestamped

  15. kalliope

    Kalliope is a framework that will help you to create your own personal assistant.

  16. whisper-standalone-win

    Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.

    Project mention: Whisper-WebUI | news.ycombinator.com | 2024-08-21

    On Windows I use whisper-standalone-win: https://github.com/Purfview/whisper-standalone-win

    It has a few customization features that are nice: https://github.com/Purfview/whisper-standalone-win/discussio...

    Works miles better than plain faster-whisper, in my experience. Not sure if there's wildcard support but that's easily scripted.

  17. Dragonfire

    the open-source virtual assistant for Ubuntu based Linux distributions

  18. dc_tts

    A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

  19. quillman

    A voice chat app

  20. StreamSpeech

    StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

    Project mention: Ask HN: Real-time speech-to-speech translation | news.ycombinator.com | 2024-10-24

    Has anyone had any luck with an offline, free, open-source real-time speech-to-speech translation app on under-powered devices (i.e., older smart phones)?

    * https://github.com/ictnlp/StreamSpeech

    * https://github.com/k2-fsa/sherpa-onnx

    * https://github.com/openai/whisper

    I'm looking for a simple app that can listen for English, translate into Korean (and other languages), then perform speech synthesis on the translation. Basically, a Babelfish that doesn't stick in the ear. Although real-time would be great, a 3- to 5-second delay is manageable.

    RTranslator is awkward (couldn't get it to perform speech-to-speech using a single phone). 3PO sprouts errors like dandelions and requires an online connection.

    Any suggestions?

  21. whisper-ctranslate2

    Whisper command line client compatible with original OpenAI client based on CTranslate2.

  22. AI-Waifu-Vtuber

    AI Vtuber for Streaming on Youtube/Twitch

  23. nonoCAPTCHA

    An asynchronized Python library to automate solving ReCAPTCHA v2 using audio

  24. whisper-playground

    Build real time speech2text web apps using OpenAI's Whisper https://openai.com/blog/whisper/

  25. june

    Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkit (by mezbaul-h)

    Project mention: Show HN: Local Voice Assistant Using Ollama, Transformers and Coqui TTS Toolkit | news.ycombinator.com | 2024-06-20
  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python speech-to-text discussion

Log in or Post with

Python speech-to-text related posts

  • Ask HN: Is Whisper Still Relevant?

    2 projects | news.ycombinator.com | 12 Feb 2025
  • Transcriber AI – Free, end-to-end machine based transcription with speaker id

    1 project | news.ycombinator.com | 16 Dec 2024
  • Ask HN: Real-time speech-to-speech translation

    4 projects | news.ycombinator.com | 24 Oct 2024
  • RealTime STT

    1 project | news.ycombinator.com | 9 Oct 2024
  • WhisperX: Precise ASR with Word-Level Timestamps and Diarization

    1 project | news.ycombinator.com | 5 Sep 2024
  • WhisperX: Precise ASR with Word-Level Timestamps and Speaker Diarization

    1 project | news.ycombinator.com | 3 Sep 2024
  • Whisper-WebUI

    6 projects | news.ycombinator.com | 21 Aug 2024
  • A note from our sponsor - Nutrient
    nutrient.io | 16 Feb 2025
    Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free. Learn more →

Index

What are some of the best open-source speech-to-text projects in Python? This list will help you:

# Project Stars
1 faster-whisper 14,037
2 whisperX 13,857
3 pyvideotrans 11,741
4 speechbrain 9,329
5 SpeechRecognition 8,587
6 RealtimeSTT 5,822
7 SenseVoice 4,439
8 voice-pro 3,173
9 lingvo 2,824
10 LLaMA-Omni 2,795
11 whisper-asr-webservice 2,297
12 whisper-timestamped 2,228
13 kalliope 1,718
14 whisper-standalone-win 1,601
15 Dragonfire 1,383
16 dc_tts 1,159
17 quillman 1,085
18 StreamSpeech 1,028
19 whisper-ctranslate2 973
20 AI-Waifu-Vtuber 900
21 nonoCAPTCHA 896
22 whisper-playground 808
23 june 748

Sponsored
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai

Did you know that Python is
the 2nd most popular programming language
based on number of references?