Python speech-recognition

Open-source Python projects categorized as speech-recognition

Top 23 Python speech-recognition Projects

  • transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

    Project mention: Fine-Tuned Llama2 Inserting Unnecessary Delimiters | /r/LocalLLaMA | 2023-11-04

    While its tough to say something specifc since we dont know how exactly you trained it or the prompt format of your training input or how you are performing inference, one thing I found when I faced similar types of issues is that the model does not know when to stop. Some of it is because the fast llama tokenizer does not add the token when encoding your inputs. So you can either add that token explicitly in your input text for each sample or use the slow llama tokenizer. Check llama_recipes github repo for the exact issue https://github.com/huggingface/transformers/issues/22794. The other most probable thing you might want to check is if the model.generate output contains the exact input tokens too. That is the expected behavior of some models (like llama2 or mpt) for example when you use vanilla transformers for inference.

  • PaddleSpeech

    Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

    Project mention: Open Source Libraries | /r/AudioAI | 2023-10-02

    PaddlePaddle/PaddleSpeech

  • InfluxDB

    Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.

  • NeMo

    NeMo: a toolkit for conversational AI

    Project mention: [P] Making a TTS voice, HK-47 from Kotor using Tortoise (Ideally WaveRNN) | /r/MachineLearning | 2023-07-06

    I don't test WaveRNN but from the ones that I know the best that is open source is FastPitch. And it's easy to use, here is the tutorial for voice cloning.

  • SpeechRecognition

    Speech recognition module for Python, supporting several engines and APIs, online and offline.

    Project mention: MacWhisper: Transcribe audio files on your Mac | news.ycombinator.com | 2023-08-23

    There is a great library that has support not only with OpenAIs whisper but many others that also work offline. https://github.com/Uberi/speech_recognition

  • espnet

    End-to-End Speech Processing Toolkit

    Project mention: [D] What's stopping you from working on speech and voice? | /r/MachineLearning | 2023-01-30

    - https://github.com/espnet/espnet

  • speechbrain

    A PyTorch-based Speech Toolkit

    Project mention: [D] Training ASR model using SpeechBrain | /r/MachineLearning | 2023-06-05

    You likely have a very broken sample in one of your batches. It looks like your training actually went through a few batches before it horked the error at you. A quick google shows a similar issue in the github repo: https://github.com/speechbrain/speechbrain/issues/649 .

  • faster-whisper

    Faster Whisper transcription with CTranslate2

    Project mention: Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller | news.ycombinator.com | 2023-10-31

    That's the implication. If the distil models are same format as original openai models then the Distil models can be converted for faster-whisper use as per the conversion instructions on https://github.com/guillaumekln/faster-whisper/

    So then we'll see whether we get the 6x model speedup on top of the stated 4x faster-whisper code speedup.

  • Onboard AI

    Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at www.getonboard.dev.

  • Porcupine  

    On-device wake word detection powered by deep learning

    Project mention: I made a ChatGPT virtual assistant that you can talk to | /r/ArtificialInteligence | 2023-04-05

    I call it DaVinci. DaVinci uses Picovoice (https://picovoice.ai/) solutions for wake word and voice activity detection and for converting speech to text, Amazon Polly to convert its responses into a natural sounding voice, and OpenAI’s GPT 3.5 to do the heavy lifting. It’s all contained in about 300 lines of Python code.

  • lingvo

    Lingvo

  • distil-whisper

    Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

    Project mention: Distil-Whisper: a distilled variant of Whisper that is 6x faster | /r/AudioAI | 2023-11-17

    Training code will be released in the Distil-Whisper repository this week, enabling anyone in the community to distill a Whisper model in their choice of language!

  • lip-reading-deeplearning

    :unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures

  • kalliope

    Kalliope is a framework that will help you to create your own personal assistant.

    Project mention: Can I run Kalliope on Windows ? | /r/learnpython | 2023-01-07

    I have found this package called Kalliope which is a personal assistant framework. I tried pip3 install kalliope, but I get an error on installing pyalsaaudio:

  • Dragonfire

    the open-source virtual assistant for Ubuntu based Linux distributions

  • whisper-asr-webservice

    OpenAI Whisper ASR Webservice API

    Project mention: How I converted a podcast into a knowledge base using Orama search and OpenAI whisper and Astro | dev.to | 2023-05-23
  • SincNet

    SincNet is a neural architecture for efficiently processing raw audio samples.

    Project mention: Does this SincNet (neural architecture) contain a discriminator? | /r/learnmachinelearning | 2022-12-30
  • whisper-timestamped

    Multilingual Automatic Speech Recognition with word-level timestamps and confidence

    Project mention: AI-assisted removal of filler words from video recordings | dev.to | 2023-11-01

    whisper-timestamped, which is a layer on top of the Whisper set of models enabling us to get accurate word timestamps and include filler words in transcription output. This transcriber downloads the selected Whisper model to the machine running the demo and no third-party API keys are required.

  • kaldi-gstreamer-server

    Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.

    Project mention: Real-time full-duplex speech recognition server, based on Kaldi and GStreamer | news.ycombinator.com | 2022-12-01
  • pykaldi

    A Python wrapper for Kaldi

  • speechpy

    :speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/

  • SpeechT5

    Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

    Project mention: [HELP] Speech2Speech translator with speaker voice preservation | /r/learnmachinelearning | 2023-05-20

    Hey! I’m doing a somewhat similar project but for TTS / voice cloning. This might not be too relevant for you but it might be one way to solve your problem. We based our project onSpeecht5 which is a multimodal setup that can take in audio or text and output audio or text. It uses speaker embeddings to handle multiple speakers, so you could use Metas S2ST to translate audio and this model to preserve the voice by doing audio to audio speech conversion. Here’s a hugging tutorial which mentions speech conversion with speecht5 https://huggingface.co/blog/speecht5

  • vosk-server

    WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries

  • DeepSpeech-examples

    Examples of how to use or integrate DeepSpeech

  • lhotse

    Tools for handling speech data in machine learning projects.

    Project mention: Does anyone else find lhotse a pain to use | /r/speechtech | 2023-06-14
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-11-17.

Python speech-recognition related posts

Index

What are some of the best open-source speech-recognition projects in Python? This list will help you:

Project Stars
1 transformers 116,187
2 PaddleSpeech 9,156
3 NeMo 8,556
4 SpeechRecognition 7,693
5 espnet 7,400
6 speechbrain 6,872
7 faster-whisper 5,814
8 Porcupine   3,251
9 lingvo 2,765
10 distil-whisper 2,377
11 lip-reading-deeplearning 1,776
12 kalliope 1,675
13 Dragonfire 1,363
14 whisper-asr-webservice 1,122
15 SincNet 1,060
16 whisper-timestamped 1,047
17 kaldi-gstreamer-server 1,038
18 pykaldi 965
19 speechpy 883
20 SpeechT5 861
21 vosk-server 776
22 DeepSpeech-examples 772
23 lhotse 766
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com