Ask HN: Speech to text models, are they usable yet?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

RealtimeSTT

3 788 8.1 Python

A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.

I have been using this with a lot of success for a while now: https://github.com/KoljaB/RealtimeSTT/tree/master , it works in real time, without any delays on an old Nvidia card.
I tried it with German & English without issues. It should also work for French but might need a bit of tweaking. The code is very straightforward, but depending on the context I'd recommend experimenting with the parameters that would suit you.
It's using a model called "Whisper" under the hood.
Have fun :)

DeepSpeech

67 24,278 0.0 C++

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project