Top 23 Python Speech Projects

MockingBird

9 33,736 5.8 Python

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
TTS

231 28,959 9.5 Python

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Project mention: OpenAI deems its voice cloning tool too risky for general release | news.ycombinator.com | 2024-03-31

lol this marketing technique is getting very old. https://github.com/coqui-ai/TTS is already amazing and open source.
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
datasets

15 18,345 9.5 Python

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Project mention: 🐍🐍 23 issues to grow yourself as an exceptional open-source Python expert 🧑‍💻 🥇 | dev.to | 2023-10-19
whisperX

24 8,869 8.7 Python

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Project mention: Easy video transcription and subtitling with Whisper, FFmpeg, and Python | news.ycombinator.com | 2024-04-06

It uses this, which does support diarization: https://github.com/m-bain/whisperX
EmotiVoice

5 6,234 8.9 Python

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Project mention: FLaNK Stack Weekly 12 February 2024 | dev.to | 2024-02-12
modelscope

3 5,984 9.7 Python

ModelScope: bring the notion of Model-as-a-Service to life.

Project mention: FLaNK Stack Weekly for 20 June 2023 | dev.to | 2023-06-20

Model as a Service https://github.com/modelscope/modelscope
lingvo

1 2,781 8.7 Python

Lingvo
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
aeneas

4 2,379 0.0 Python

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
gTTS

3 2,133 7.6 Python

Python library and CLI tool to interface with Google Translate's text-to-speech API

Project mention: Using Groq to Build a Real-Time Language Translation App | dev.to | 2024-04-05

For our real-time TTS needs, we'll employ the fantastic library called gTTS.
DeepFilterNet

10 1,886 9.1 Python

Noise supression using deep filtering

Project mention: Anyone know of a good TTS pipeline for raw speech data? | /r/AudioAI | 2023-10-03

You mean remove background noise and transcribe? Then you can use DeepFilterNet to remove noise, and Whisper to transcribe.
whisper-timestamped

2 1,481 8.3 Python

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

Project mention: Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old | news.ycombinator.com | 2024-02-28

Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]
[0] https://github.com/linto-ai/whisper-timestamped
dc_tts

4 1,150 0.0 Python

A TensorFlow Implementation of DC-TTS: yet another text-to-speech model
pykaldi

2 977 5.7 Python

A Python wrapper for Kaldi
NATSpeech

4 944 1.8 Python

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)
voicefixer

2 896 5.4 Python

General Speech Restoration

Project mention: Linux Audio Noise suppression using deep filtering in Rust | news.ycombinator.com | 2023-06-06
lhotse

1 861 9.0 Python

Tools for handling speech data in machine learning projects.

Project mention: Does anyone else find lhotse a pain to use | /r/speechtech | 2023-06-14
SALMONN

2 786 9.0 Python

SALMONN: Speech Audio Language Music Open Neural Network

Project mention: Comparing Humans, GPT-4, and GPT-4V on Abstraction and Reasoning Tasks | news.ycombinator.com | 2023-11-19

> In other words, if you express a problem in a more complicated space (e.g. a visual problem, or an abstract algebra problem), you will not be able to solve it in the smaller token space, there's not enough information
You're aware multimodel transformers do exactly this?
https://github.com/bytedance/SALMONN
diffwave

3 720 1.5 Python

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
inaSpeechSegmenter

3 692 6.4 Python

CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.

Project mention: Listen to HD radio with a $30 RTL SDR dongle | news.ycombinator.com | 2023-11-05

I have a little hobby project where I record an FM radio music station using a SDR and then remove all the non-music portions for offline listening. I like the music selections the DJs pick, but I prefer not to listen to the DJ commentary and the advertisements.
I evaluated three methods of recording: analog capture from a standalone FM receiver, using this nrsc5 library to record the "HD" radio stream, and using an AirSpy SDR with this library: https://github.com/jj1bdx/airspy-fmradion
Recording the "HD" (what a misnomer) radio was nice in that there was no hiss or multipath effects, but in comparison to the other methods the digital compression artifacts became impossible to un-hear. It seems to top out at about 96 kbps
The airspy-fmradion library has some nice stuff in it to address multipath, resulting in the best audio quality of the three methods I tested.
I use https://github.com/ina-foss/inaSpeechSegmenter to identify which segments of the recordings are speech vs. music.
Speech-enhancement

2 583 0.0 Python

Deep learning for audio denoising
allosaurus

2 502 0.0 Python

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
StarGANv2-VC

3 454 1.3 Python

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
UniSpeech

1 387 4.5 Python

UniSpeech - Large Scale Self-Supervised Learning for Speech
SaaSHub

www.saashub.com
sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-06.

Python Speech related posts

Easy video transcription and subtitling with Whisper, FFmpeg, and Python
1 project | news.ycombinator.com | 6 Apr 2024
Using Groq to Build a Real-Time Language Translation App
3 projects | dev.to | 5 Apr 2024
OpenAI deems its voice cloning tool too risky for general release
1 project | news.ycombinator.com | 31 Mar 2024
SOTA ASR Tooling: Long-Form Transcription
1 project | news.ycombinator.com | 31 Mar 2024
Deploying whisperX on AWS SageMaker as Asynchronous Endpoint
2 projects | dev.to | 31 Mar 2024
Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old
1 project | news.ycombinator.com | 28 Feb 2024
Base TTS (Amazon): The largest text-to-speech model to-date
3 projects | news.ycombinator.com | 14 Feb 2024
A note from our sponsor - InfluxDB
www.influxdata.com | 18 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Speech projects in Python? This list will help you:

	Project	Stars
1	MockingBird	33,736
2	TTS	28,959
3	datasets	18,345
4	whisperX	8,869
5	EmotiVoice	6,234
6	modelscope	5,984
7	lingvo	2,781
8	aeneas	2,379
9	gTTS	2,133
10	DeepFilterNet	1,886
11	whisper-timestamped	1,481
12	dc_tts	1,150
13	pykaldi	977
14	NATSpeech	944
15	voicefixer	896
16	lhotse	861
17	SALMONN	786
18	diffwave	720
19	inaSpeechSegmenter	692
20	Speech-enhancement	583
21	allosaurus	502
22	StarGANv2-VC	454
23	UniSpeech	387