Top 23 Python speech-recognition Projects

transformers

175 124,557 10.0 Python

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Project mention: Maxtext: A simple, performant and scalable Jax LLM | news.ycombinator.com | 2024-04-23

Is t5x an encoder/decoder architecture?
Some more general options.
The Flax ecosystem
https://github.com/google/flax?tab=readme-ov-file
or dm-haiku
https://github.com/google-deepmind/dm-haiku
were some of the best developed communities in the Jax AI field
Perhaps the “trax” repo? https://github.com/google/trax
Some HF examples https://github.com/huggingface/transformers/tree/main/exampl...
Sadly it seems much of the work is proprietary these days, but one example could be Grok-1, if you customize the details. https://github.com/xai-org/grok-1/blob/main/run.py

PaddleSpeech

6 10,120 7.6 Python

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Project mention: Open Source Libraries | /r/AudioAI | 2023-10-02

PaddlePaddle/PaddleSpeech

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
whisperX

24 8,965 8.4 Python

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Project mention: Easy video transcription and subtitling with Whisper, FFmpeg, and Python | news.ycombinator.com | 2024-04-06

It uses this, which does support diarization: https://github.com/m-bain/whisperX

faster-whisper

22 8,723 8.3 Python

Faster Whisper transcription with CTranslate2

Project mention: Using Groq to Build a Real-Time Language Translation App | dev.to | 2024-04-05

For our real-time STT needs, we'll employ a fantastic library called faster-whisper.

SpeechRecognition

16 8,040 8.7 Python

Speech recognition module for Python, supporting several engines and APIs, online and offline.

Project mention: help with script (beginner) | /r/learnpython | 2023-12-07

Start and Stop Listening Example

speechbrain

26 7,869 9.8 Python

A PyTorch-based Speech Toolkit

Project mention: SpeechBrain 1.0: A free and open-source AI toolkit for all things speech | news.ycombinator.com | 2024-02-28

espnet

15 7,872 10.0 Python

End-to-End Speech Processing Toolkit

Project mention: WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper | news.ycombinator.com | 2024-01-17

You might check out this list from espnet. They list the different corpuses they use to train their models sorted by language and task (ASR, TTS etc):
https://github.com/espnet/espnet/blob/master/egs2/README.md

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
wenet

5 3,691 9.6 Python

Production First and Production Ready End-to-End Speech Recognition Toolkit

Project mention: Open Source Libraries | /r/AudioAI | 2023-10-02

wenet-e2e/wenet

Porcupine

31 3,424 9.1 Python

On-device wake word detection powered by deep learning
FunASR

2 3,110 9.9 Python

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models. ｜语音识别工具包，包含丰富的性能优越的开源预训练模型，支持语音识别、语音端点检测、文本后处理等，具备服务部署能力。

Project mention: FunASR: Fundamental End-to-End Speech Recognition Toolkit | news.ycombinator.com | 2024-01-13

distil-whisper

9 3,125 8.5 Python

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Project mention: FLaNK Stack 05 Feb 2024 | dev.to | 2024-02-05

lingvo

1 2,780 8.7 Python

Lingvo
lip-reading-deeplearning

1 1,786 10.0 Python

:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
kalliope

4 1,696 0.0 Python

Kalliope is a framework that will help you to create your own personal assistant.
whisper-asr-webservice

11 1,617 7.7 Python

OpenAI Whisper ASR Webservice API

Project mention: How I converted a podcast into a knowledge base using Orama search and OpenAI whisper and Astro | dev.to | 2023-05-23

whisper-timestamped

2 1,501 8.3 Python

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

Project mention: Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old | news.ycombinator.com | 2024-02-28

Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]
[0] https://github.com/linto-ai/whisper-timestamped

Dragonfire

2 1,372 0.0 Python

the open-source virtual assistant for Ubuntu based Linux distributions
SincNet

3 1,097 0.0 Python

SincNet is a neural architecture for efficiently processing raw audio samples.
kaldi-gstreamer-server

4 1,054 0.0 Python

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
SpeechT5

4 1,018 7.9 Python

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Project mention: [HELP] Speech2Speech translator with speaker voice preservation | /r/learnmachinelearning | 2023-05-20

Hey! I’m doing a somewhat similar project but for TTS / voice cloning. This might not be too relevant for you but it might be one way to solve your problem. We based our project onSpeecht5 which is a multimodal setup that can take in audio or text and output audio or text. It uses speaker embeddings to handle multiple speakers, so you could use Metas S2ST to translate audio and this model to preserve the voice by doing audio to audio speech conversion. Here’s a hugging tutorial which mentions speech conversion with speecht5 https://huggingface.co/blog/speecht5

pykaldi

2 978 5.7 Python

A Python wrapper for Kaldi
speechpy

0 880 0.0 Python

:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/
lhotse

1 863 9.0 Python

Tools for handling speech data in machine learning projects.

Project mention: Does anyone else find lhotse a pain to use | /r/speechtech | 2023-06-14

SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python speech-recognition related posts

Easy video transcription and subtitling with Whisper, FFmpeg, and Python
1 project | news.ycombinator.com | 6 Apr 2024
SOTA ASR Tooling: Long-Form Transcription
1 project | news.ycombinator.com | 31 Mar 2024
Deploying whisperX on AWS SageMaker as Asynchronous Endpoint
2 projects | dev.to | 31 Mar 2024
SpeechBrain 1.0: A free and open-source AI toolkit for all things speech
1 project | news.ycombinator.com | 28 Feb 2024
Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old
1 project | news.ycombinator.com | 28 Feb 2024
Speech-to-Text Benchmark
1 project | news.ycombinator.com | 16 Jan 2024
FunASR: Fundamental End-to-End Speech Recognition Toolkit
1 project | news.ycombinator.com | 13 Jan 2024
A note from our sponsor - InfluxDB
www.influxdata.com | 25 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source speech-recognition projects in Python? This list will help you:

	Project	Stars
1	transformers	124,557
2	PaddleSpeech	10,120
3	whisperX	8,965
4	faster-whisper	8,723
5	SpeechRecognition	8,040
6	speechbrain	7,869
7	espnet	7,872
8	wenet	3,691
9	Porcupine	3,424
10	FunASR	3,110
11	distil-whisper	3,125
12	lingvo	2,780
13	lip-reading-deeplearning	1,786
14	kalliope	1,696
15	whisper-asr-webservice	1,617
16	whisper-timestamped	1,501
17	Dragonfire	1,372
18	SincNet	1,097
19	kaldi-gstreamer-server	1,054
20	SpeechT5	1,018
21	pykaldi	978
22	speechpy	880
23	lhotse	863