The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 12 Python speech-processing Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
Wave-U-Net-for-Speech-Enhancement
Implement Wave-U-Net by PyTorch, and migrate it to the speech enhancement.
-
hifigan-denoiser
HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
-
NLP-Guide
Natural Language Processing (NLP). Covering topics such as Tokenization, Part Of Speech tagging (POS), Machine translation, Named Entity Recognition (NER), Classification, and Sentiment analysis.
-
speech-emotion-recognition
A program that uses neural networks to detect emotions from pre-recorded and real-time speech
Project mention: SpeechBrain 1.0: A free and open-source AI toolkit for all things speech | news.ycombinator.com | 2024-02-28
Project mention: Retentive Network: A Successor to Transformer Implemented in PyTorch | news.ycombinator.com | 2023-07-24A retnet commit has now appeared in Microsoft's torchscale repo:
https://github.com/microsoft/torchscale/commit/bf65397b26469...
Project mention: Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old | news.ycombinator.com | 2024-02-28Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]
Project mention: Linux Audio Noise suppression using deep filtering in Rust | news.ycombinator.com | 2023-06-06
Resemble AI | San Francisco Bay Area (office in Santa Clara, CA) | Full-Time | Full-Stack Engineer, Frontend Engineer, Product Designer
Resemble AI creates high-quality synthetic voices that capture human emotion. We're a venture-backed high-growth startup that's looking to shake up an entire industry with state of the art AI. Our product changes the way that thousands of brands, media companies, creative agencies, and game studios create speech content. We believe that to build an enticing product and solid team is by encouraging innovation is by enabling continuous education. That's why every Friday is a day that you can use to work on anything you want, Resemble-related or not.
Recently, we open sourced a state of the art speech enhancement model: https://github.com/resemble-ai/resemble-enhance
We're hiring for three roles:
Full Stack Engineer - Can you break the entire stack? You're the right person for this job. Work on our Rails app, with sprinkles of React, and Python for the deep learning. Everything is dockerized, and we use Kubernetes to deploy.
Frontend Engineer - We're hiring a Frontend Engineer proficient in React, TypeScript, and Ruby on Rails to shape our user experience. Join our team to develop user-friendly interfaces and collaborate on building exceptional web experiences.
Product Designer - As a Product Designer, you will lead the end-to-end design process, from concept to implementation, ensuring a seamless and delightful user experience. You will collaborate with cross-functional teams to define product vision, conduct user research, create visually compelling interfaces, and develop interactive prototypes.
If interested, reach out directly to me: zohaib [at] resemble.ai
Project mention: Using Whisper to transcribe the entire Forensic Files series | /r/DataHoarder | 2023-06-04
Python speech-processing related posts
- Show HN: PodText.ai – Search anything said on a podcast, Highlight text to play
- I wanted to use OpenAI's Whisper speech-to-text on my Mac without installing stuff in the Terminal so I made MacWhisper, a free Mac app to transcribe audio and video files for easy transcription and subtitle generation. Would love to hear some feedback on it!
- I won several speaker diarization challenges with pyannote.audio
- Can Whisper differentiate between different voices?
- [D] Is there a way to distinguish different human voices from 1 audio file ?
- Post-Game Analysis: Destiny & Alex VS Andrew & Zen Shapiro
- A quick and dirty tool for automatically analyzing speaking time in online debates (Effortpost)
-
A note from our sponsor - WorkOS
workos.com | 19 Apr 2024
Index
What are some of the best open-source speech-processing projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | speechbrain | 7,836 |
2 | torchscale | 2,909 |
3 | whisper-timestamped | 1,481 |
4 | SincNet | 1,075 |
5 | voicefixer | 896 |
6 | resemble-enhance | 885 |
7 | UniSpeech | 387 |
8 | Wave-U-Net-for-Speech-Enhancement | 302 |
9 | whisper-auto-transcribe | 192 |
10 | hifigan-denoiser | 188 |
11 | NLP-Guide | 64 |
12 | speech-emotion-recognition | 17 |