Need help with training ASR model from scratch.

This page summarizes the projects mentioned and recommended in the original post on

Our great sponsors
  • - Download’s Tech Salary Report
  • SonarQube - Static code analysis for 29 languages.
  • Scout APM - Less time debugging, more time building
  • espnet

    End-to-End Speech Processing Toolkit

    You actually dont need to have phone level alignment for your data. Both hybrid and end-2-end approaches can work with utterance level alignment. For the hybrid approach, you would need a lexicon which maps each unique word in your training transcription to its phone sequence. You can obtain this with CMU's tool. For end-2-end approach you will need a byte pair encoder to tokenize the words in the transcriptions to its sub-words.

  • NeMo

    NeMo: a toolkit for conversational AI

    This is relatively small amount of speech to train the model from scratch, but you can train using another pre-trained model for initialization. There are numbers of end-to-end ASR toolkits which can be used for this: and


    Download’s Tech Salary Report. Median salaries, most in-demand technologies, state of the remote work... all you need to know your worth on the market by tech recruitment platform

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts