Python speech-recognition

Open-source Python projects categorized as speech-recognition

Top 23 Python speech-recognition Projects

speech-recognition
  • transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

    Project mention: How to Learn Generative AI: A Step-by-Step Guide | dev.to | 2024-09-23

    Play around with OpenAI’s GPT models and Hugging Face's Transformers library.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • whisperX

    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

    Project mention: WhisperX: Precise ASR with Word-Level Timestamps and Diarization | news.ycombinator.com | 2024-09-05
  • faster-whisper

    Faster Whisper transcription with CTranslate2

    Project mention: Self-hosted offline transcription and diarization service with LLM summary | news.ycombinator.com | 2024-05-26

    I've been using this:

    https://github.com/bugbakery/transcribee

    It's noticeably work-in-progress but it does the job and has a nice UI to edit transcriptions and speakers etc.

    It's running on the CPU for me, would be nice to have something that can make use of a 4GB Nvidia GPU, which faster-whisper is actually able to [1]

    https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file...

  • PaddleSpeech

    Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

  • speechbrain

    A PyTorch-based Speech Toolkit

    Project mention: Speaker Diarization in Python | dev.to | 2024-08-22

    Simple Diarizer Simple Diarizer is a speaker diarization library that utilizes pretrained models from SpeechBrain . To get started with simple_diarizer, follow these steps:

  • SpeechRecognition

    Speech recognition module for Python, supporting several engines and APIs, online and offline.

    Project mention: help with script (beginner) | /r/learnpython | 2023-12-07

    Start and Stop Listening Example

  • espnet

    End-to-End Speech Processing Toolkit

    Project mention: WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper | news.ycombinator.com | 2024-01-17

    You might check out this list from espnet. They list the different corpuses they use to train their models sorted by language and task (ASR, TTS etc):

    https://github.com/espnet/espnet/blob/master/egs2/README.md

  • FunASR

    A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

    Project mention: FunASR: Fundamental End-to-End Speech Recognition Toolkit | news.ycombinator.com | 2024-01-13
  • wenet

    Production First and Production Ready End-to-End Speech Recognition Toolkit

  • Porcupine  

    On-device wake word detection powered by deep learning

  • distil-whisper

    Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

    Project mention: New OpenAI Whisper model: "turbo" | news.ycombinator.com | 2024-09-30

    Details will be shared tomorrow, but from what I have read they have distilled the large model decoder into this turbo that only has 4 layers instead of 32, the encoder should remain the same size. Similar to https://github.com/huggingface/distil-whisper but the model is distilled using multilingual data instead of just English, and the decoder is 4 layers instead of 2.

  • lingvo

    Lingvo

  • whisper-asr-webservice

    OpenAI Whisper ASR Webservice API

  • whisper-timestamped

    Multilingual Automatic Speech Recognition with word-level timestamps and confidence

    Project mention: Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old | news.ycombinator.com | 2024-02-28

    Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]

    [0] https://github.com/linto-ai/whisper-timestamped

  • lip-reading-deeplearning

    :unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures

  • kalliope

    Kalliope is a framework that will help you to create your own personal assistant.

  • Dragonfire

    the open-source virtual assistant for Ubuntu based Linux distributions

  • whisper-standalone-win

    Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.

    Project mention: Whisper-WebUI | news.ycombinator.com | 2024-08-21

    On Windows I use whisper-standalone-win: https://github.com/Purfview/whisper-standalone-win

    It has a few customization features that are nice: https://github.com/Purfview/whisper-standalone-win/discussio...

    Works miles better than plain faster-whisper, in my experience. Not sure if there's wildcard support but that's easily scripted.

  • SpeechT5

    Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

  • SincNet

    SincNet is a neural architecture for efficiently processing raw audio samples.

  • kaldi-gstreamer-server

    Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.

  • quillman

    A chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.

    Project mention: ChatGPT Voice Announced (By Greg Brockman) | news.ycombinator.com | 2023-11-21

    https://github.com/modal-labs/quillman

    I built something similar using WebKit speech recognition but that's limited to Chromium.

  • SALMONN

    SALMONN: Speech Audio Language Music Open Neural Network

    Project mention: Comparing Humans, GPT-4, and GPT-4V on Abstraction and Reasoning Tasks | news.ycombinator.com | 2023-11-19

    > In other words, if you express a problem in a more complicated space (e.g. a visual problem, or an abstract algebra problem), you will not be able to solve it in the smaller token space, there's not enough information

    You're aware multimodel transformers do exactly this?

    https://github.com/bytedance/SALMONN

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python speech-recognition discussion

Log in or Post with

Python speech-recognition related posts

  • Show HN: Auto Transcribe/Subtitle Videos – Python OpenAI Whisper Wrapper and GUI

    1 project | news.ycombinator.com | 3 Oct 2024
  • New OpenAI Whisper model: "turbo"

    2 projects | news.ycombinator.com | 30 Sep 2024
  • WhisperX: Precise ASR with Word-Level Timestamps and Diarization

    1 project | news.ycombinator.com | 5 Sep 2024
  • WhisperX: Precise ASR with Word-Level Timestamps and Speaker Diarization

    1 project | news.ycombinator.com | 3 Sep 2024
  • OctoTube: Voice Search for YouTube

    1 project | dev.to | 22 Aug 2024
  • OTranscribe: A free and open tool for transcribing audio interviews

    8 projects | news.ycombinator.com | 9 Aug 2024
  • StreamSpeech: "All in One" model for simultaneous ASR, translation and TTS

    1 project | news.ycombinator.com | 17 Jun 2024
  • A note from our sponsor - SaaSHub
    www.saashub.com | 4 Oct 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source speech-recognition projects in Python? This list will help you:

Project Stars
1 transformers 132,909
2 whisperX 11,720
3 faster-whisper 11,628
4 PaddleSpeech 10,970
5 speechbrain 8,624
6 SpeechRecognition 8,356
7 espnet 8,346
8 FunASR 6,179
9 wenet 4,085
10 Porcupine   3,706
11 distil-whisper 3,538
12 lingvo 2,811
13 whisper-asr-webservice 1,988
14 whisper-timestamped 1,913
15 lip-reading-deeplearning 1,820
16 kalliope 1,711
17 Dragonfire 1,384
18 whisper-standalone-win 1,188
19 SpeechT5 1,163
20 SincNet 1,115
21 kaldi-gstreamer-server 1,069
22 quillman 1,030
23 SALMONN 996

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com