Python Asr

Open-source Python projects categorized as Asr

Top 23 Python Asr Projects

  1. whisperX

    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

    Project mention: Ask HN: Is Whisper Still Relevant? | news.ycombinator.com | 2025-02-12

    Yes it's still relevant but I prefer WhisperX for some tasks: https://github.com/m-bain/whisperX

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. NeMo

    A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

    Project mention: Speaker Diarization in Python | dev.to | 2024-08-22

    NVIDIA NeMo To perform speaker diarization using NVIDIA NeMo , follow these steps:

  4. PaddleSpeech

    Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

  5. speechbrain

    A PyTorch-based Speech Toolkit

    Project mention: Speaker Diarization in Python | dev.to | 2024-08-22

    Simple Diarizer Simple Diarizer is a speaker diarization library that utilizes pretrained models from SpeechBrain . To get started with simple_diarizer, follow these steps:

  6. SenseVoice

    Multilingual Voice Understanding Model

    Project mention: Omni SenseVoice: High-Speed Speech Recognition with Words Timestamps | news.ycombinator.com | 2024-10-12

    I mean they make a bold statement up top just to paddle back a little bit further down with: "[…] In terms of Chinese and Cantonese recognition, the SenseVoice-Small model has advantages."

    It feels dishonest to me.

    [0] https://github.com/FunAudioLLM/SenseVoice?tab=readme-ov-file...

  7. nexa-sdk

    Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.

    Project mention: Benchmark GGUF models with a one line of code | news.ycombinator.com | 2024-11-01
  8. wenet

    Production First and Production Ready End-to-End Speech Recognition Toolkit

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. youtube-transcript-api

    This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!

    Project mention: The Singoff-agen — Learning Through Dumb Projects | dev.to | 2025-04-20

    First, I had to get the data. I thought this would be a good place to use Google Cloud API which can extract Youtube transcripts. But while setting up this service, I realized there was an even easier way; by extracting the auto-generated captions from YouTube. (Thankyou _jdepoix _for the library to do so, https://github.com/jdepoix/youtube-transcript-api).

  11. lingvo

    Lingvo

  12. whisper-asr-webservice

    OpenAI Whisper ASR Webservice API

  13. whisper-timestamped

    Multilingual Automatic Speech Recognition with word-level timestamps and confidence

  14. whisper-standalone-win

    Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.

    Project mention: Whisper-WebUI | news.ycombinator.com | 2024-08-21

    On Windows I use whisper-standalone-win: https://github.com/Purfview/whisper-standalone-win

    It has a few customization features that are nice: https://github.com/Purfview/whisper-standalone-win/discussio...

    Works miles better than plain faster-whisper, in my experience. Not sure if there's wildcard support but that's easily scripted.

  15. SincNet

    SincNet is a neural architecture for efficiently processing raw audio samples.

  16. StreamSpeech

    StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

    Project mention: Ask HN: Real-time speech-to-speech translation | news.ycombinator.com | 2024-10-24

    Has anyone had any luck with an offline, free, open-source real-time speech-to-speech translation app on under-powered devices (i.e., older smart phones)?

    * https://github.com/ictnlp/StreamSpeech

    * https://github.com/k2-fsa/sherpa-onnx

    * https://github.com/openai/whisper

    I'm looking for a simple app that can listen for English, translate into Korean (and other languages), then perform speech synthesis on the translation. Basically, a Babelfish that doesn't stick in the ear. Although real-time would be great, a 3- to 5-second delay is manageable.

    RTranslator is awkward (couldn't get it to perform speech-to-speech using a single phone). 3PO sprouts errors like dandelions and requires an online connection.

    Any suggestions?

  17. vosk-server

    WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries

  18. pykaldi

    A Python wrapper for Kaldi

  19. whisper.api

    This project provides an API with user level access support to transcribe speech to text using a finetuned and processed Whisper ASR model.

  20. CrisperWhisper

    Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

    Project mention: CrisperWhisper: Automatic Speech Recognition with improved word-level timestamps | news.ycombinator.com | 2024-11-22
  21. cheetah

    On-device streaming speech-to-text engine powered by deep learning (by Picovoice)

  22. AutoSub

    A CLI script to generate subtitle files (SRT/VTT/TXT) for any video using either DeepSpeech or Coqui (by abhirooptalasila)

  23. pyannote-whisper

  24. leopard

    On-device speech-to-text engine powered by deep learning

  25. reverb

    Open source inference code for Rev's model

    Project mention: The Technology Behind YouTube’s Auto-Captioning System | dev.to | 2025-04-29

    For those interested in exploring or extracting YouTube transcripts for their own projects, tools like Transcriptly and Rev.com offer additional functionality, such as downloading, editing, and translating captions.

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Asr discussion

Log in or Post with

Python Asr related posts

  • The Technology Behind YouTube’s Auto-Captioning System

    1 project | dev.to | 29 Apr 2025
  • Show HN: Mikey – No bot meeting notetaker for Windows

    6 projects | news.ycombinator.com | 12 Feb 2025
  • Ask HN: Is Whisper Still Relevant?

    2 projects | news.ycombinator.com | 12 Feb 2025
  • Transcriber AI – Free, end-to-end machine based transcription with speaker id

    1 project | news.ycombinator.com | 16 Dec 2024
  • Supercharge Your AI Skills: 5 Open Source Repositories You Can't Afford to Miss

    5 projects | dev.to | 21 Nov 2024
  • Benchmark GGUF models with a one line of code

    1 project | news.ycombinator.com | 1 Nov 2024
  • Benchmark GGUF models with a ONE line of code

    1 project | news.ycombinator.com | 27 Oct 2024
  • A note from our sponsor - SaaSHub
    www.saashub.com | 14 May 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Asr projects in Python? This list will help you:

# Project Stars
1 whisperX 15,599
2 NeMo 14,217
3 PaddleSpeech 11,875
4 speechbrain 9,808
5 SenseVoice 5,578
6 nexa-sdk 4,534
7 wenet 4,490
8 youtube-transcript-api 3,876
9 lingvo 2,838
10 whisper-asr-webservice 2,579
11 whisper-timestamped 2,393
12 whisper-standalone-win 2,040
13 SincNet 1,171
14 StreamSpeech 1,071
15 vosk-server 1,042
16 pykaldi 1,015
17 whisper.api 880
18 CrisperWhisper 696
19 cheetah 622
20 AutoSub 595
21 pyannote-whisper 586
22 leopard 455
23 reverb 402

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?