Python speech-recognition

Open-source Python projects categorized as speech-recognition

Top 23 Python speech-recognition Projects

speech-recognition
  1. transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

    Project mention: None of the top 10 projects in GitHub is actually a software project 🤯 | dev.to | 2025-05-10

    We see an addition to the AI community with AutoGPT. Along with Tensorflow they represent the AI community in the software category, which is getting relevant (2 out of 8). We can expect in the future to have new AI projects in the top 25 such as Transformers or Ollama (currently top 34 and 36, respectively).

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. faster-whisper

    Faster Whisper transcription with CTranslate2

    Project mention: Play 3.0 mini – A lightweight, reliable, cost-efficient Multilingual TTS model | news.ycombinator.com | 2024-10-14

    Hi, I don't know what's SOTA, but I got good results with these (open source, on-device) :

    https://github.com/SYSTRAN/faster-whisper (speech-to-text)

  4. whisperX

    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

    Project mention: Ask HN: Is Whisper Still Relevant? | news.ycombinator.com | 2025-02-12

    Yes it's still relevant but I prefer WhisperX for some tasks: https://github.com/m-bain/whisperX

  5. PaddleSpeech

    Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

  6. FunASR

    A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

    Project mention: Omni SenseVoice: High-Speed Speech Recognition with Words Timestamps | news.ycombinator.com | 2024-10-12

    Apparently not. See https://github.com/lifeiteng/OmniSenseVoice/blob/main/src/om.... See also FunASR running SenseVoice but using Kaldi for speaker identification https://github.com/modelscope/FunASR/blob/cd684580991661b9a0...

  7. speechbrain

    A PyTorch-based Speech Toolkit

    Project mention: Speaker Diarization in Python | dev.to | 2024-08-22

    Simple Diarizer Simple Diarizer is a speaker diarization library that utilizes pretrained models from SpeechBrain . To get started with simple_diarizer, follow these steps:

  8. espnet

    End-to-End Speech Processing Toolkit

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. SpeechRecognition

    Speech recognition module for Python, supporting several engines and APIs, online and offline.

  11. SenseVoice

    Multilingual Voice Understanding Model

    Project mention: Omni SenseVoice: High-Speed Speech Recognition with Words Timestamps | news.ycombinator.com | 2024-10-12

    I mean they make a bold statement up top just to paddle back a little bit further down with: "[…] In terms of Chinese and Cantonese recognition, the SenseVoice-Small model has advantages."

    It feels dishonest to me.

    [0] https://github.com/FunAudioLLM/SenseVoice?tab=readme-ov-file...

  12. wenet

    Production First and Production Ready End-to-End Speech Recognition Toolkit

  13. Porcupine  

    On-device wake word detection powered by deep learning

  14. distil-whisper

    Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

    Project mention: New OpenAI Whisper model: "turbo" | news.ycombinator.com | 2024-09-30

    Details will be shared tomorrow, but from what I have read they have distilled the large model decoder into this turbo that only has 4 layers instead of 32, the encoder should remain the same size. Similar to https://github.com/huggingface/distil-whisper but the model is distilled using multilingual data instead of just English, and the decoder is 4 layers instead of 2.

  15. voice-pro

    Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.

    Project mention: Voice-Pro: Ultimate AI Voice Conversion and Multilingual Translation Tool 🔊 | dev.to | 2025-02-10

    GitHub: https://github.com/abus-aikorea/voice-pro

  16. lingvo

    Lingvo

  17. whisper-asr-webservice

    OpenAI Whisper ASR Webservice API

  18. whisper-timestamped

    Multilingual Automatic Speech Recognition with word-level timestamps and confidence

  19. whisper-standalone-win

    Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.

    Project mention: Whisper-WebUI | news.ycombinator.com | 2024-08-21

    On Windows I use whisper-standalone-win: https://github.com/Purfview/whisper-standalone-win

    It has a few customization features that are nice: https://github.com/Purfview/whisper-standalone-win/discussio...

    Works miles better than plain faster-whisper, in my experience. Not sure if there's wildcard support but that's easily scripted.

  20. lip-reading-deeplearning

    :unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures

  21. kalliope

    Kalliope is a framework that will help you to create your own personal assistant.

  22. Dragonfire

    the open-source virtual assistant for Ubuntu based Linux distributions

  23. SpeechT5

    Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

  24. SALMONN

    SALMONN: Speech Audio Language Music Open Neural Network

  25. SincNet

    SincNet is a neural architecture for efficiently processing raw audio samples.

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python speech-recognition discussion

Log in or Post with

Python speech-recognition related posts

  • Run LLMs on Apple Neural Engine (ANE)

    11 projects | news.ycombinator.com | 3 May 2025
  • The Technology Behind YouTube’s Auto-Captioning System

    1 project | dev.to | 29 Apr 2025
  • Amazon Is Discontinuing the "Do Not Send Voice Recordings" Feature on Echo

    3 projects | news.ycombinator.com | 16 Mar 2025
  • The FFT Strikes Back: An Efficient Alternative to Self-Attention

    2 projects | news.ycombinator.com | 27 Feb 2025
  • Ask HN: Is Whisper Still Relevant?

    2 projects | news.ycombinator.com | 12 Feb 2025
  • Transcriber AI – Free, end-to-end machine based transcription with speaker id

    1 project | news.ycombinator.com | 16 Dec 2024
  • Show HN: Voice-Pro – AI Voice Cloning Magic: Transform Any Voice in 15 Seconds

    10 projects | news.ycombinator.com | 27 Nov 2024
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 12 May 2025
    InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →

Index

What are some of the best open-source speech-recognition projects in Python? This list will help you:

# Project Stars
1 transformers 144,045
2 faster-whisper 15,829
3 whisperX 15,599
4 PaddleSpeech 11,853
5 FunASR 10,298
6 speechbrain 9,777
7 espnet 9,067
8 SpeechRecognition 8,712
9 SenseVoice 5,578
10 wenet 4,490
11 Porcupine   4,088
12 distil-whisper 3,845
13 voice-pro 3,647
14 lingvo 2,838
15 whisper-asr-webservice 2,566
16 whisper-timestamped 2,393
17 whisper-standalone-win 2,040
18 lip-reading-deeplearning 1,870
19 kalliope 1,728
20 Dragonfire 1,383
21 SpeechT5 1,336
22 SALMONN 1,220
23 SincNet 1,171

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?