Our great sponsors
-
pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
I've tried Resemblyzer's method, yet it always either cut out too much of his voice, or included too much of others. It also required that i have a clip of him talking, and the quality of that clip heavily impacted its performance.
With my short experience trying to get diarization to work, it is a pain to get good results unless the environment can be controlled (quiet/reasonable microphones etc). I'd suggest checking out pyannote which I've used with reasonable accuracy but had to train a few models in the pipeline myself.
It may be worth checking SpeechBrain out as well which was recently released. Has some pre-trained models, it might give you a reasonable baseline to start with but haven't used it personally.
Related posts
- AI Transcribing tool for video with two voices?
- I wanted to use OpenAI's Whisper speech-to-text on my Mac without installing stuff in the Terminal so I made MacWhisper, a free Mac app to transcribe audio and video files for easy transcription and subtitle generation. Would love to hear some feedback on it!
- I won several speaker diarization challenges with pyannote.audio
- Can Whisper differentiate between different voices?
- Post-Game Analysis: Destiny & Alex VS Andrew & Zen Shapiro