[D] state of art for Speaker Diarization?

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • Resemblyzer

    A python package to analyze and compare voices with deep learning

  • I've tried Resemblyzer's method, yet it always either cut out too much of his voice, or included too much of others. It also required that i have a clip of him talking, and the quality of that clip heavily impacted its performance.

  • pyannote-audio

    Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

  • With my short experience trying to get diarization to work, it is a pain to get good results unless the environment can be controlled (quiet/reasonable microphones etc). I'd suggest checking out pyannote which I've used with reasonable accuracy but had to train a few models in the pipeline myself.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • speechbrain

    A PyTorch-based Speech Toolkit

  • It may be worth checking SpeechBrain out as well which was recently released. Has some pre-trained models, it might give you a reasonable baseline to start with but haven't used it personally.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts