SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Speech Projects
-
Project mention: Show HN: Voice-Pro – AI Voice Cloning Magic: Transform Any Voice in 15 Seconds | news.ycombinator.com | 2024-11-27
It's really easy for a technical person to do as well.
I use Coqui TTS[0] as part of my home automation, I wrote a small python script that lets me upload a voice clip for it to clone after I got the idea from HeyWillow[1], and a small shim that lets me send the output to a Home Assistant media player instead of using their standard output device. I run the TTS container on a VM with a Tesla P4 (~£100 to buy) and get about 1x-2x (roughly the same time it'd take to say it, to process) using the large model.
Just for a giggle, I uploaded a few 3s-5s second clip of myself speaking and cloned my voice, then executed a command to our living room media player to call my wife into the room; from another room, she was 100% convinced it was myself speaking words I'd never spoken.
I tried playing with a variety of sentences for a few hours and overall, it sounded almost exactly like me, to me, with the exception of some "attitude" and "intonation" I know I wouldn't use in my speech. I didn't notice much of an improvement using much longer clips; the short ones were "good enough".
Tangentially, it really bugs me that most phone providers in the UK insist you record a "personal greeting" now before they'll let you check your voice mail box, I just record silence, because the last thing I want/need is a voicemail greeting in my voice confirming to some randomer I didn't want calling me, who I am and that my number is active, even more so knowing how I can
[0] https://github.com/coqui-ai/TTS
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
MockingBird
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
-
datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Project mention: 20 Open Source Tools I Recommend to Build, Share, and Run AI Projects | dev.to | 2024-11-13Datasets library repository for accessing and sharing datasets with the community.
-
Yes it's still relevant but I prefer WhisperX for some tasks: https://github.com/m-bain/whisperX
-
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Project mention: AI Voice Agents: Opensource, Pre-Trained Voice Activity Detector | news.ycombinator.com | 2024-07-28
-
Project mention: Ask HN: What is the state of OSS voice cloning? | news.ycombinator.com | 2024-09-30
-
Project mention: Real-time ML audio noise suppression on Raspberry Pi Pico 2 | news.ycombinator.com | 2024-08-09
Very cool! Would be curious to see how this compares to https://github.com/Rikorose/DeepFilterNet written in Rust.
Or this Samsung Research paper https://research.samsung.com/blog/FSPEN-AN-ULTRA-LIGHTWEIGHT...
-
-
aeneas
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
-
-
For our real-time TTS needs, we'll employ the fantastic library called gTTS.
-
whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
-
Project mention: Toucan TTS: MIT licensed Text to Speech in 7000 languages | news.ycombinator.com | 2024-06-20
-
-
-
-
StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Has anyone had any luck with an offline, free, open-source real-time speech-to-speech translation app on under-powered devices (i.e., older smart phones)?
* https://github.com/ictnlp/StreamSpeech
* https://github.com/k2-fsa/sherpa-onnx
* https://github.com/openai/whisper
I'm looking for a simple app that can listen for English, translate into Korean (and other languages), then perform speech synthesis on the translation. Basically, a Babelfish that doesn't stick in the ear. Although real-time would be great, a 3- to 5-second delay is manageable.
RTranslator is awkward (couldn't get it to perform speech-to-speech using a single phone). 3PO sprouts errors like dandelions and requires an online connection.
Any suggestions?
-
-
-
NATSpeech
A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Speech discussion
Python Speech related posts
-
Show HN: Mikey – No bot meeting notetaker for Windows
-
Ask HN: Is Whisper Still Relevant?
-
Show HN: Using YOLO to Detect Office Chairs in 40M Hotel Photos
-
Transcriber AI – Free, end-to-end machine based transcription with speaker id
-
Supercharge Your AI Skills: 5 Open Source Repositories You Can't Afford to Miss
-
Show HN: Offline audiobook from any format with one CLI command
-
WhisperX: Precise ASR with Word-Level Timestamps and Diarization
-
A note from our sponsor - SaaSHub
www.saashub.com | 27 Mar 2025
Index
What are some of the best open-source Speech projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | TTS | 38,861 |
2 | MockingBird | 36,008 |
3 | datasets | 19,851 |
4 | whisperX | 14,571 |
5 | AudioGPT | 10,116 |
6 | EmotiVoice | 7,753 |
7 | modelscope | 7,598 |
8 | silero-vad | 5,314 |
9 | metavoice-src | 4,076 |
10 | DeepFilterNet | 2,930 |
11 | lingvo | 2,833 |
12 | aeneas | 2,609 |
13 | whisper-asr-webservice | 2,454 |
14 | gTTS | 2,423 |
15 | whisper-timestamped | 2,322 |
16 | IMS-Toucan | 1,564 |
17 | SALMONN | 1,191 |
18 | dc_tts | 1,159 |
19 | voicefixer | 1,113 |
20 | StreamSpeech | 1,041 |
21 | pykaldi | 1,010 |
22 | lhotse | 987 |
23 | NATSpeech | 971 |