Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Python speech-recognition Projects
-
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
SpeechRecognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models. |语音识别工具包,包含丰富的性能优越的开源预训练模型,支持语音识别、语音端点检测、文本后处理等,具备服务部署能力。
-
distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
-
lip-reading-deeplearning
:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
-
whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
-
kaldi-gstreamer-server
Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
-
speechpy
:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Maxtext: A simple, performant and scalable Jax LLM | news.ycombinator.com | 2024-04-23Is t5x an encoder/decoder architecture?
Some more general options.
The Flax ecosystem
https://github.com/google/flax?tab=readme-ov-file
or dm-haiku
https://github.com/google-deepmind/dm-haiku
were some of the best developed communities in the Jax AI field
Perhaps the “trax” repo? https://github.com/google/trax
Some HF examples https://github.com/huggingface/transformers/tree/main/exampl...
Sadly it seems much of the work is proprietary these days, but one example could be Grok-1, if you customize the details. https://github.com/xai-org/grok-1/blob/main/run.py
PaddlePaddle/PaddleSpeech
Project mention: Easy video transcription and subtitling with Whisper, FFmpeg, and Python | news.ycombinator.com | 2024-04-06It uses this, which does support diarization: https://github.com/m-bain/whisperX
For our real-time STT needs, we'll employ a fantastic library called faster-whisper.
Start and Stop Listening Example
Project mention: SpeechBrain 1.0: A free and open-source AI toolkit for all things speech | news.ycombinator.com | 2024-02-28
Project mention: WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper | news.ycombinator.com | 2024-01-17You might check out this list from espnet. They list the different corpuses they use to train their models sorted by language and task (ASR, TTS etc):
https://github.com/espnet/espnet/blob/master/egs2/README.md
wenet-e2e/wenet
Project mention: FunASR: Fundamental End-to-End Speech Recognition Toolkit | news.ycombinator.com | 2024-01-13
Project mention: How I converted a podcast into a knowledge base using Orama search and OpenAI whisper and Astro | dev.to | 2023-05-23
Project mention: Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old | news.ycombinator.com | 2024-02-28Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]
[0] https://github.com/linto-ai/whisper-timestamped
Project mention: [HELP] Speech2Speech translator with speaker voice preservation | /r/learnmachinelearning | 2023-05-20Hey! I’m doing a somewhat similar project but for TTS / voice cloning. This might not be too relevant for you but it might be one way to solve your problem. We based our project onSpeecht5 which is a multimodal setup that can take in audio or text and output audio or text. It uses speaker embeddings to handle multiple speakers, so you could use Metas S2ST to translate audio and this model to preserve the voice by doing audio to audio speech conversion. Here’s a hugging tutorial which mentions speech conversion with speecht5 https://huggingface.co/blog/speecht5
Python speech-recognition related posts
- Easy video transcription and subtitling with Whisper, FFmpeg, and Python
- SOTA ASR Tooling: Long-Form Transcription
- Deploying whisperX on AWS SageMaker as Asynchronous Endpoint
- SpeechBrain 1.0: A free and open-source AI toolkit for all things speech
- Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old
- Speech-to-Text Benchmark
- FunASR: Fundamental End-to-End Speech Recognition Toolkit
-
A note from our sponsor - InfluxDB
www.influxdata.com | 25 Apr 2024
Index
What are some of the best open-source speech-recognition projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | transformers | 124,557 |
2 | PaddleSpeech | 10,120 |
3 | whisperX | 8,965 |
4 | faster-whisper | 8,723 |
5 | SpeechRecognition | 8,040 |
6 | speechbrain | 7,869 |
7 | espnet | 7,872 |
8 | wenet | 3,691 |
9 | Porcupine | 3,424 |
10 | FunASR | 3,110 |
11 | distil-whisper | 3,125 |
12 | lingvo | 2,780 |
13 | lip-reading-deeplearning | 1,786 |
14 | kalliope | 1,696 |
15 | whisper-asr-webservice | 1,617 |
16 | whisper-timestamped | 1,501 |
17 | Dragonfire | 1,372 |
18 | SincNet | 1,097 |
19 | kaldi-gstreamer-server | 1,054 |
20 | SpeechT5 | 1,018 |
21 | pykaldi | 978 |
22 | speechpy | 880 |
23 | lhotse | 863 |
Sponsored