[D] ASR/Automatic Speech Recognition toolkit that provides precise word-level timing data? (eg, where in the audio stream a word starts and ends?)

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Kaldi Speech Recognition Toolkit

22 13,706 7.4 Shell

kaldi-asr/kaldi is the official location of the Kaldi project.

It sounds like you could use forced alignment, which can be done through Kaldi or the Montreal Forced Aligner, which uses Kaldi for backend ASR. Full disclosure, I'm the primary maintainer for MFA, but it should fit your use case.

vosk-api

59 7,025 5.9 Jupyter Notebook

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

How to get high-quality, low-cost Speech-to-Text transcription?
3 projects | /r/AskProgramming | 24 Jul 2022
Nerd-dictation, hackable speech to text on Linux
10 projects | news.ycombinator.com | 17 Jan 2022
Help picking a good speech recognition library
3 projects | /r/learnpython | 1 Dec 2021
help with script (beginner)
1 project | /r/learnpython | 7 Dec 2023
MacWhisper: Transcribe audio files on your Mac
8 projects | news.ycombinator.com | 23 Aug 2023

[D] ASR/Automatic Speech Recognition toolkit that provides precise word-level timing data? (eg, where in the audio stream a word starts and ends?)

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
speech-recognition Kaldi speech-to-text speaker-verification Audio
Post date: 23 Aug 2021

Kaldi Speech Recognition Toolkit

vosk-api

InfluxDB

Related posts

[D] ASR/Automatic Speech Recognition toolkit that provides precise word-level timing data? (eg, where in the audio stream a word starts and ends?)

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning speech-recognition Kaldi speech-to-text speaker-verification Audio Post date: 23 Aug 2021

Kaldi Speech Recognition Toolkit

vosk-api

InfluxDB

Related posts

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
speech-recognition Kaldi speech-to-text speaker-verification Audio
Post date: 23 Aug 2021