Help picking a good speech recognition library

This page summarizes the projects mentioned and recommended in the original post on /r/learnpython

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • DeepSpeech

    DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

  • https://github.com/mozilla/DeepSpeech (no longer actively supported by Mozilla but still a pretty good library, relatively easy to use, and decent out of the box accuracy)

  • Kaldi Speech Recognition Toolkit

    kaldi-asr/kaldi is the official location of the Kaldi project.

  • https://kaldi-asr.org/ (best out of the box accuracy but it is a complicated toolkit and not beginner friendly)

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • espnet

    End-to-End Speech Processing Toolkit

  • https://github.com/espnet/espnet (kind of like a newer Kaldi, but also not beginner friendly)

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts