[D] state of art for Speaker Diarization?

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Resemblyzer

4 2,592 3.4 Python

A python package to analyze and compare voices with deep learning

I've tried Resemblyzer's method, yet it always either cut out too much of his voice, or included too much of others. It also required that i have a clip of him talking, and the quality of that clip heavily impacted its performance.

pyannote-audio

15 5,027 8.6 Jupyter Notebook

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

With my short experience trying to get diarization to work, it is a pain to get good results unless the environment can be controlled (quiet/reasonable microphones etc). I'd suggest checking out pyannote which I've used with reasonable accuracy but had to train a few models in the pipeline myself.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
speechbrain

26 7,869 9.8 Python

A PyTorch-based Speech Toolkit

It may be worth checking SpeechBrain out as well which was recently released. Has some pre-trained models, it might give you a reasonable baseline to start with but haven't used it personally.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project