SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python speech-recognition Projects
-
While its tough to say something specifc since we dont know how exactly you trained it or the prompt format of your training input or how you are performing inference, one thing I found when I faced similar types of issues is that the model does not know when to stop. Some of it is because the fast llama tokenizer does not add the token when encoding your inputs. So you can either add that token explicitly in your input text for each sample or use the slow llama tokenizer. Check llama_recipes github repo for the exact issue https://github.com/huggingface/transformers/issues/22794. The other most probable thing you might want to check is if the model.generate output contains the exact input tokens too. That is the expected behavior of some models (like llama2 or mpt) for example when you use vanilla transformers for inference.
-
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
PaddlePaddle/PaddleSpeech
-
InfluxDB
Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
-
Project mention: [P] Making a TTS voice, HK-47 from Kotor using Tortoise (Ideally WaveRNN) | /r/MachineLearning | 2023-07-06
I don't test WaveRNN but from the ones that I know the best that is open source is FastPitch. And it's easy to use, here is the tutorial for voice cloning.
-
SpeechRecognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
There is a great library that has support not only with OpenAIs whisper but many others that also work offline. https://github.com/Uberi/speech_recognition
-
Project mention: [D] What's stopping you from working on speech and voice? | /r/MachineLearning | 2023-01-30
- https://github.com/espnet/espnet
-
You likely have a very broken sample in one of your batches. It looks like your training actually went through a few batches before it horked the error at you. A quick google shows a similar issue in the github repo: https://github.com/speechbrain/speechbrain/issues/649 .
-
Project mention: Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller | news.ycombinator.com | 2023-10-31
That's the implication. If the distil models are same format as original openai models then the Distil models can be converted for faster-whisper use as per the conversion instructions on https://github.com/guillaumekln/faster-whisper/
So then we'll see whether we get the 6x model speedup on top of the stated 4x faster-whisper code speedup.
-
Onboard AI
Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at www.getonboard.dev.
-
Project mention: I made a ChatGPT virtual assistant that you can talk to | /r/ArtificialInteligence | 2023-04-05
I call it DaVinci. DaVinci uses Picovoice (https://picovoice.ai/) solutions for wake word and voice activity detection and for converting speech to text, Amazon Polly to convert its responses into a natural sounding voice, and OpenAI’s GPT 3.5 to do the heavy lifting. It’s all contained in about 300 lines of Python code.
-
-
distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
Project mention: Distil-Whisper: a distilled variant of Whisper that is 6x faster | /r/AudioAI | 2023-11-17Training code will be released in the Distil-Whisper repository this week, enabling anyone in the community to distill a Whisper model in their choice of language!
-
lip-reading-deeplearning
:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
-
I have found this package called Kalliope which is a personal assistant framework. I tried pip3 install kalliope, but I get an error on installing pyalsaaudio:
-
-
Project mention: How I converted a podcast into a knowledge base using Orama search and OpenAI whisper and Astro | dev.to | 2023-05-23
-
Project mention: Does this SincNet (neural architecture) contain a discriminator? | /r/learnmachinelearning | 2022-12-30
-
whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
whisper-timestamped, which is a layer on top of the Whisper set of models enabling us to get accurate word timestamps and include filler words in transcription output. This transcriber downloads the selected Whisper model to the machine running the demo and no third-party API keys are required.
-
kaldi-gstreamer-server
Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
Project mention: Real-time full-duplex speech recognition server, based on Kaldi and GStreamer | news.ycombinator.com | 2022-12-01 -
-
speechpy
:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/
-
Project mention: [HELP] Speech2Speech translator with speaker voice preservation | /r/learnmachinelearning | 2023-05-20
Hey! I’m doing a somewhat similar project but for TTS / voice cloning. This might not be too relevant for you but it might be one way to solve your problem. We based our project onSpeecht5 which is a multimodal setup that can take in audio or text and output audio or text. It uses speaker embeddings to handle multiple speakers, so you could use Metas S2ST to translate audio and this model to preserve the voice by doing audio to audio speech conversion. Here’s a hugging tutorial which mentions speech conversion with speecht5 https://huggingface.co/blog/speecht5
-
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python speech-recognition related posts
- Ask HN: How do you get started with adding voice commands to a computer system?
- Distil-Whisper: a distilled variant of Whisper that is 6x faster
- FLaNK Stack Weekly 06 Nov 2023
- AI — weekly megathread!
- Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller
- Distil-Whisper is up to 6x faster than Whisper while performing within 1% Word-Error-Rate on out-of-distribution eval sets
- Talk-Llama
-
A note from our sponsor - #<SponsorshipServiceOld:0x00007f0f9b995ca8>
www.saashub.com | 1 Dec 2023
Index
What are some of the best open-source speech-recognition projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | transformers | 116,187 |
2 | PaddleSpeech | 9,156 |
3 | NeMo | 8,556 |
4 | SpeechRecognition | 7,693 |
5 | espnet | 7,400 |
6 | speechbrain | 6,872 |
7 | faster-whisper | 5,814 |
8 | Porcupine | 3,251 |
9 | lingvo | 2,765 |
10 | distil-whisper | 2,377 |
11 | lip-reading-deeplearning | 1,776 |
12 | kalliope | 1,675 |
13 | Dragonfire | 1,363 |
14 | whisper-asr-webservice | 1,122 |
15 | SincNet | 1,060 |
16 | whisper-timestamped | 1,047 |
17 | kaldi-gstreamer-server | 1,038 |
18 | pykaldi | 965 |
19 | speechpy | 883 |
20 | SpeechT5 | 861 |
21 | vosk-server | 776 |
22 | DeepSpeech-examples | 772 |
23 | lhotse | 766 |