Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Python Speech Projects
-
MockingBird
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
-
Project mention: OpenAI deems its voice cloning tool too risky for general release | news.ycombinator.com | 2024-03-31
lol this marketing technique is getting very old. https://github.com/coqui-ai/TTS is already amazing and open source.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Project mention: 🐍🐍 23 issues to grow yourself as an exceptional open-source Python expert 🧑💻 🥇 | dev.to | 2023-10-19 -
Project mention: Easy video transcription and subtitling with Whisper, FFmpeg, and Python | news.ycombinator.com | 2024-04-06
It uses this, which does support diarization: https://github.com/m-bain/whisperX
-
-
Model as a Service https://github.com/modelscope/modelscope
-
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
aeneas
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
-
For our real-time TTS needs, we'll employ the fantastic library called gTTS.
-
You mean remove background noise and transcribe? Then you can use DeepFilterNet to remove noise, and Whisper to transcribe.
-
whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Project mention: Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old | news.ycombinator.com | 2024-02-28Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]
-
-
-
NATSpeech
A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)
-
Project mention: Linux Audio Noise suppression using deep filtering in Rust | news.ycombinator.com | 2023-06-06
-
-
Project mention: Comparing Humans, GPT-4, and GPT-4V on Abstraction and Reasoning Tasks | news.ycombinator.com | 2023-11-19
> In other words, if you express a problem in a more complicated space (e.g. a visual problem, or an abstract algebra problem), you will not be able to solve it in the smaller token space, there's not enough information
You're aware multimodel transformers do exactly this?
-
-
inaSpeechSegmenter
CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
I have a little hobby project where I record an FM radio music station using a SDR and then remove all the non-music portions for offline listening. I like the music selections the DJs pick, but I prefer not to listen to the DJ commentary and the advertisements.
I evaluated three methods of recording: analog capture from a standalone FM receiver, using this nrsc5 library to record the "HD" radio stream, and using an AirSpy SDR with this library: https://github.com/jj1bdx/airspy-fmradion
Recording the "HD" (what a misnomer) radio was nice in that there was no hiss or multipath effects, but in comparison to the other methods the digital compression artifacts became impossible to un-hear. It seems to top out at about 96 kbps
The airspy-fmradion library has some nice stuff in it to address multipath, resulting in the best audio quality of the three methods I tested.
I use https://github.com/ina-foss/inaSpeechSegmenter to identify which segments of the recordings are speech vs. music.
-
-
-
StarGANv2-VC
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Speech related posts
- Easy video transcription and subtitling with Whisper, FFmpeg, and Python
- Using Groq to Build a Real-Time Language Translation App
- OpenAI deems its voice cloning tool too risky for general release
- SOTA ASR Tooling: Long-Form Transcription
- Deploying whisperX on AWS SageMaker as Asynchronous Endpoint
- Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old
- Base TTS (Amazon): The largest text-to-speech model to-date
-
A note from our sponsor - InfluxDB
www.influxdata.com | 18 Apr 2024
Index
What are some of the best open-source Speech projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | MockingBird | 33,736 |
2 | TTS | 28,959 |
3 | datasets | 18,345 |
4 | whisperX | 8,869 |
5 | EmotiVoice | 6,234 |
6 | modelscope | 5,984 |
7 | lingvo | 2,781 |
8 | aeneas | 2,379 |
9 | gTTS | 2,133 |
10 | DeepFilterNet | 1,886 |
11 | whisper-timestamped | 1,481 |
12 | dc_tts | 1,150 |
13 | pykaldi | 977 |
14 | NATSpeech | 944 |
15 | voicefixer | 896 |
16 | lhotse | 861 |
17 | SALMONN | 786 |
18 | diffwave | 720 |
19 | inaSpeechSegmenter | 692 |
20 | Speech-enhancement | 583 |
21 | allosaurus | 502 |
22 | StarGANv2-VC | 454 |
23 | UniSpeech | 387 |