The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 7 Python speaker-diarization Projects
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models. |语音识别工具包,包含丰富的性能优越的开源预训练模型,支持语音识别、语音端点检测、文本后处理等,具备服务部署能力。
-
uis-rnn
This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
-
whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Project mention: WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper | news.ycombinator.com | 2024-01-17You might check out this list from espnet. They list the different corpuses they use to train their models sorted by language and task (ASR, TTS etc):
https://github.com/espnet/espnet/blob/master/egs2/README.md
Project mention: SpeechBrain 1.0: A free and open-source AI toolkit for all things speech | news.ycombinator.com | 2024-02-28
Project mention: FunASR: Fundamental End-to-End Speech Recognition Toolkit | news.ycombinator.com | 2024-01-13
Project mention: Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old | news.ycombinator.com | 2024-02-28Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]
[0] https://github.com/linto-ai/whisper-timestamped
These will be 3-5 hour recordings of 4-5 people. I plan to use https://github.com/yinruiqing/pyannote-whisper to generate the transcript from the recording.
Python speaker-diarization related posts
- Summarization of long transcriptions
- Show HN: PodText.ai – Search anything said on a podcast, Highlight text to play
- I wanted to use OpenAI's Whisper speech-to-text on my Mac without installing stuff in the Terminal so I made MacWhisper, a free Mac app to transcribe audio and video files for easy transcription and subtitle generation. Would love to hear some feedback on it!
- I won several speaker diarization challenges with pyannote.audio
- Can Whisper differentiate between different voices?
- [D] Is there a way to distinguish different human voices from 1 audio file ?
- Post-Game Analysis: Destiny & Alex VS Andrew & Zen Shapiro
-
A note from our sponsor - WorkOS
workos.com | 26 Apr 2024
Index
What are some of the best open-source speaker-diarization projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | espnet | 7,872 |
2 | speechbrain | 7,869 |
3 | FunASR | 3,299 |
4 | uis-rnn | 1,529 |
5 | whisper-timestamped | 1,501 |
6 | diart | 789 |
7 | pyannote-whisper | 414 |
Sponsored