Our great sponsors
-
silero-models
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
-
vscode-ltex
LTeX: Grammar/spell checker :mag::heavy_check_mark: for VS Code using LanguageTool with support for LaTeX :mortar_board:, Markdown :pencil:, and others
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Silero[0] seems to have decent performance (although you will have to some minimal coding). I believe there are better ones if you're willing to tinker a bit more.
[0]: https://github.com/snakers4/silero-models
The output is more intended for captioning so it's lots of short phrases with timestamps and no punctuation, but it'll give you a quick taste of what Vosk can do.
[1] https://github.com/o-oconnell/mp4grep
The Mozilla DeepSpeech spin-off Coqui has an STT that is locally installable:
https://coqui.ai/
Related posts
- What's the best text-to-speech free non-cloud software?
- Ask HN: Are there any good open source Text-to-Speech tools?
- OpenAI deems its voice cloning tool too risky for general release
- Base TTS (Amazon): The largest text-to-speech model to-date
- WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper