SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python speech-recognition Projects
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.Project mention: Fine-Tuned Llama2 Inserting Unnecessary Delimiters | /r/LocalLLaMA | 2023-11-04
While its tough to say something specifc since we dont know how exactly you trained it or the prompt format of your training input or how you are performing inference, one thing I found when I faced similar types of issues is that the model does not know when to stop. Some of it is because the fast llama tokenizer does not add the token when encoding your inputs. So you can either add that token explicitly in your input text for each sample or use the slow llama tokenizer. Check llama_recipes github repo for the exact issue https://github.com/huggingface/transformers/issues/22794. The other most probable thing you might want to check is if the model.generate output contains the exact input tokens too. That is the expected behavior of some models (like llama2 or mpt) for example when you use vanilla transformers for inference.
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.Project mention: Open Source Libraries | /r/AudioAI | 2023-10-02
Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
NeMo: a toolkit for conversational AIProject mention: [P] Making a TTS voice, HK-47 from Kotor using Tortoise (Ideally WaveRNN) | /r/MachineLearning | 2023-07-06
I don't test WaveRNN but from the ones that I know the best that is open source is FastPitch. And it's easy to use, here is the tutorial for voice cloning.
Speech recognition module for Python, supporting several engines and APIs, online and offline.
End-to-End Speech Processing ToolkitProject mention: [D] What's stopping you from working on speech and voice? | /r/MachineLearning | 2023-01-30
A PyTorch-based Speech ToolkitProject mention: [D] Training ASR model using SpeechBrain | /r/MachineLearning | 2023-06-05
You likely have a very broken sample in one of your batches. It looks like your training actually went through a few batches before it horked the error at you. A quick google shows a similar issue in the github repo: https://github.com/speechbrain/speechbrain/issues/649 .
Faster Whisper transcription with CTranslate2Project mention: Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller | news.ycombinator.com | 2023-10-31
That's the implication. If the distil models are same format as original openai models then the Distil models can be converted for faster-whisper use as per the conversion instructions on https://github.com/guillaumekln/faster-whisper/
So then we'll see whether we get the 6x model speedup on top of the stated 4x faster-whisper code speedup.
Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at www.getonboard.dev.
On-device wake word detection powered by deep learningProject mention: I made a ChatGPT virtual assistant that you can talk to | /r/ArtificialInteligence | 2023-04-05
I call it DaVinci. DaVinci uses Picovoice (https://picovoice.ai/) solutions for wake word and voice activity detection and for converting speech to text, Amazon Polly to convert its responses into a natural sounding voice, and OpenAI’s GPT 3.5 to do the heavy lifting. It’s all contained in about 300 lines of Python code.
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.Project mention: Distil-Whisper: a distilled variant of Whisper that is 6x faster | /r/AudioAI | 2023-11-17
Training code will be released in the Distil-Whisper repository this week, enabling anyone in the community to distill a Whisper model in their choice of language!
:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
Kalliope is a framework that will help you to create your own personal assistant.Project mention: Can I run Kalliope on Windows ? | /r/learnpython | 2023-01-07
I have found this package called Kalliope which is a personal assistant framework. I tried pip3 install kalliope, but I get an error on installing pyalsaaudio:
the open-source virtual assistant for Ubuntu based Linux distributions
OpenAI Whisper ASR Webservice APIProject mention: How I converted a podcast into a knowledge base using Orama search and OpenAI whisper and Astro | dev.to | 2023-05-23
SincNet is a neural architecture for efficiently processing raw audio samples.Project mention: Does this SincNet (neural architecture) contain a discriminator? | /r/learnmachinelearning | 2022-12-30
Multilingual Automatic Speech Recognition with word-level timestamps and confidenceProject mention: AI-assisted removal of filler words from video recordings | dev.to | 2023-11-01
whisper-timestamped, which is a layer on top of the Whisper set of models enabling us to get accurate word timestamps and include filler words in transcription output. This transcriber downloads the selected Whisper model to the machine running the demo and no third-party API keys are required.
Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.Project mention: Real-time full-duplex speech recognition server, based on Kaldi and GStreamer | news.ycombinator.com | 2022-12-01
A Python wrapper for Kaldi
:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/
Unified-Modal Speech-Text Pre-Training for Spoken Language ProcessingProject mention: [HELP] Speech2Speech translator with speaker voice preservation | /r/learnmachinelearning | 2023-05-20
Hey! I’m doing a somewhat similar project but for TTS / voice cloning. This might not be too relevant for you but it might be one way to solve your problem. We based our project onSpeecht5 which is a multimodal setup that can take in audio or text and output audio or text. It uses speaker embeddings to handle multiple speakers, so you could use Metas S2ST to translate audio and this model to preserve the voice by doing audio to audio speech conversion. Here’s a hugging tutorial which mentions speech conversion with speecht5 https://huggingface.co/blog/speecht5
WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
Examples of how to use or integrate DeepSpeech
Tools for handling speech data in machine learning projects.Project mention: Does anyone else find lhotse a pain to use | /r/speechtech | 2023-06-14
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python speech-recognition related posts
Ask HN: How do you get started with adding voice commands to a computer system?
2 projects | news.ycombinator.com | 21 Nov 2023
Distil-Whisper: a distilled variant of Whisper that is 6x faster
1 project | /r/AudioAI | 17 Nov 2023
FLaNK Stack Weekly 06 Nov 2023
21 projects | dev.to | 6 Nov 2023
AI — weekly megathread!
3 projects | /r/artificial | 5 Nov 2023
Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller
1 project | /r/hackernews | 3 Nov 2023
Distil-Whisper is up to 6x faster than Whisper while performing within 1% Word-Error-Rate on out-of-distribution eval sets
1 project | /r/speechtech | 2 Nov 2023
8 projects | news.ycombinator.com | 2 Nov 2023
A note from our sponsor - #<SponsorshipServiceOld:0x00007f0f9b995ca8>
www.saashub.com | 1 Dec 2023
What are some of the best open-source speech-recognition projects in Python? This list will help you: