Top 23 speech-to-text Open-Source Projects

whisper.cpp

187 30,942 9.8 C

Port of OpenAI's Whisper model in C/C++

Project mention: Show HN: I created automatic subtitling app to boost short videos | news.ycombinator.com | 2024-04-09

whisper.cpp [1] has a karaoke example that uses ffmpeg's drawtext filter to display rudimentary karaoke-like captions. It also supports diarisation. Perhaps it could be a starting point to create a better script that does what you need.
--
1: https://github.com/ggerganov/whisper.cpp/blob/master/README....

DeepSpeech

67 24,278 0.0 C++

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Project mention: Common Voice | news.ycombinator.com | 2023-12-05

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Leon

34 14,539 8.4 TypeScript

🧠 Leon is your open-source personal assistant.

Project mention: Rabbit R1, Designed by Teenage Engineering | news.ycombinator.com | 2024-01-09

It's indeed suspicious. You're sending your voice samples, your various services accounts, your location and more private data to some proprietary black box in some public cloud. Sorry, but this is a privacy nightmare. It should be open source and self-hosted like Mycroft (https://mycroft.ai) or Leon (https://getleon.ai) to be trustworthy.

Kaldi Speech Recognition Toolkit

22 13,706 7.4 Shell

kaldi-asr/kaldi is the official location of the Kaldi project.

Project mention: Amazon plans to charge for Alexa in June–unless internal conflict delays revamp | news.ycombinator.com | 2024-01-20

Yeah, whisper is the closest thing we have, but even it requires more processing power than is present in most of these edge devices in order to feel smooth. I've started a voice interface project on a Raspberry Pi 4, and it takes about 3 seconds to produce a result. That's impressive, but not fast enough for Alexa.
From what I gather a Pi 5 can do it in 1.5 seconds, which is closer, so I suspect it's only a matter of time before we do have fully local STT running directly on speakers.
> Probably anathema to the space, but if the devices leaned into the ~five tasks people use them for (timers, weather, todo list?) could probably tighten up the AI models to be more accurate and/or resource efficient.
Yes, this is the approach taken by a lot of streaming STT systems, like Kaldi [0]. Rather than use a fully capable model, you train a specialized one that knows what kinds of things people are likely to say to it.
[0] http://kaldi-asr.org/

whisperX

24 8,965 8.4 Python

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Project mention: Easy video transcription and subtitling with Whisper, FFmpeg, and Python | news.ycombinator.com | 2024-04-06

It uses this, which does support diarization: https://github.com/m-bain/whisperX

faster-whisper

22 8,723 8.3 Python

Faster Whisper transcription with CTranslate2

Project mention: Using Groq to Build a Real-Time Language Translation App | dev.to | 2024-04-05

For our real-time STT needs, we'll employ a fantastic library called faster-whisper.

SpeechRecognition

16 8,040 8.7 Python

Speech recognition module for Python, supporting several engines and APIs, online and offline.

Project mention: help with script (beginner) | /r/learnpython | 2023-12-07

Start and Stop Listening Example

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
speechbrain

26 7,869 9.8 Python

A PyTorch-based Speech Toolkit

Project mention: SpeechBrain 1.0: A free and open-source AI toolkit for all things speech | news.ycombinator.com | 2024-02-28

vosk-api

59 7,025 5.9 Jupyter Notebook

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Project mention: VOSK Offline Speech Recognition API | news.ycombinator.com | 2024-04-13

annyang

2 6,547 0.0 JavaScript

:speech_balloon: Speech recognition for your site
pyvideotrans

1 5,556 9.7 Python

Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言，并添加配音

Project mention: FLaNK Stack Weekly 06 Nov 2023 | dev.to | 2023-11-06

silero-models

32 4,534 4.7 Jupyter Notebook

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

Project mention: Weird A.I. Yankovic, a cursed deep dive into the world of voice cloning | news.ycombinator.com | 2023-10-02

I doubt it's currently actually "the best open source text to speech", but the answer I came up with when throwing a couple of hours at the problem some months ago was "Silero" [0, 1].
Following the "standalone" guide [2], it was pretty trivial to make the model render my sample text in about 100 English "voices" (many of which were similar to each other, and in varying quality). Sampling those, I got about 10 that were pretty "good". And maybe 6 that were the "best ones" (pretty natural, not annoying to listen to).
IIRC the license was free for noncommercial use only. I'm not sure exactly "how open source" they are, but it was simple to install the dependencies and write the basic Python to try it out; I had to write a for loop to try all the voices like I wanted. I ended using something else for the project for other reasons, but this could still be fairly good backup option for some use cases IMO.
  [0] https://github.com/snakers4/silero-models#text-to-speech

lingvo

1 2,780 8.7 Python

Lingvo
willow

37 2,361 9.6 C

Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative

Project mention: ESPHome | news.ycombinator.com | 2024-04-23

Fair points but with all due respect completely misses the point and context. My comment was a reply to a new user interested in esphome on a post about esphome.
You're talking about CircuitPython, 35KB web replies, PSRAM, UF2 bootloader, etc. These are comparatively very advanced topics and you didn't mention esphome once.
The comfort and familiarity of Amazon for what is already a new, intimidating, and challenging subject is of immeasurable value for a novice. They can click those links, fill a cart, and have stuff show up tomorrow with all of the usual ease, friendliness, and reliability of Amazon. If they get frustrated or it doesn't work out they can shove it in the box and get a full refund Amazon-style.
You're suggesting wandering all over the internet, ordering stuff from China, multiple vendors, etc while describing a bunch of things that frankly just won't matter to them. I say this as someone who has been an esphome and home assistant user since day one. The approach I described has never failed or remotely bothered me and over the past ~decade I've seen it suggested to new users successfully time and time again.
In terms of PSRAM to my knowledge the only thing it is utilized for in the esphome ecosystem is higher resolution displays and more advanced voice assistant scenarios that almost always require -S3 anyway and are a very advanced, challenging use cases. I'm very familiar with displays, voice, the S3, and PSRAM but more on that in a second...
> live with one less LX7 core and no Bluetooth
I'm the founder of Willow[0] and when comparing Willow to esphome the most frequent request we get is supporting bluetooth functionality i.e. esphome bluetooth proxy[1]. This is an extremely popular use case in the esphome/home assistant community. Not having bluetooth while losing a core and paying more is a bigger issue than pin spacing.
It's also a pretty obscure board and while not a big deal to you and I if you look around at docs, guides, etc, etc you'll see the cheap-o boards from Amazon are by far the most popular and common (unsurprisingly). Another plus for a new user.
Speaking of Willow (and back to PSRAM again) even the voice assistant satellite functionality of Home Assistant doesn't fundamentally require it - the most popular device doesn't have it either[2].
Very valuable comment with a lot of interesting information, just doesn't apply to context.
[0] - https://heywillow.io/
[1] - https://esphome.io/components/bluetooth_proxy.html
[2] - https://www.home-assistant.io/voice_control/thirteen-usd-voi...

STT

11 2,131 0.6 C++

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

Project mention: Rest in Peas: The Unrecognized Death of Speech Recognition (2010) | news.ycombinator.com | 2023-05-04

What has happened since then? I know Common Voice has come and gone https://en.wikipedia.org/wiki/Common_Voice https://github.com/coqui-ai/STT
And I've seen some neural approaches too
No idea where to look for comparisons though.

whisper-diarization

5 1,985 7.2 Jupyter Notebook

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

Project mention: MacWhisper: Transcribe audio files on your Mac | news.ycombinator.com | 2023-08-23

https://github.com/MahmoudAshraf97/whisper-diarization
This project has been alright for transcribing audio with speaker diarization. A big finicky. The OpenAI model is better than other paid products(Descript, Riverside) so I’m looking forward to trying MacWhisper.

kalliope

4 1,696 0.0 Python

Kalliope is a framework that will help you to create your own personal assistant.
soloud

4 1,644 0.0 C

Free, easy, portable audio engine for games
whisper-asr-webservice

11 1,617 7.7 Python

OpenAI Whisper ASR Webservice API

Project mention: How I converted a podcast into a knowledge base using Orama search and OpenAI whisper and Astro | dev.to | 2023-05-23

whisper-timestamped

2 1,501 8.3 Python

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

Project mention: Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old | news.ycombinator.com | 2024-02-28

Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]
[0] https://github.com/linto-ai/whisper-timestamped

Dragonfire

2 1,372 0.0 Python

the open-source virtual assistant for Ubuntu based Linux distributions
dc_tts

4 1,150 0.0 Python

A TensorFlow Implementation of DC-TTS: yet another text-to-speech model
awesome-whisper

5 989 5.4

🔊 Awesome list for Whisper — an open-source AI-powered speech recognition system developed by OpenAI

Project mention: Whisper as a PUSH to STT to Clipboard solution? | /r/OpenAI | 2023-08-26

SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

speech-to-text related posts

VOSK Offline Speech Recognition API
1 project | news.ycombinator.com | 13 Apr 2024
Show HN: I created automatic subtitling app to boost short videos
1 project | news.ycombinator.com | 9 Apr 2024
Easy video transcription and subtitling with Whisper, FFmpeg, and Python
1 project | news.ycombinator.com | 6 Apr 2024
SOTA ASR Tooling: Long-Form Transcription
1 project | news.ycombinator.com | 31 Mar 2024
Deploying whisperX on AWS SageMaker as Asynchronous Endpoint
2 projects | dev.to | 31 Mar 2024
LLMs on your local Computer (Part 1)
7 projects | dev.to | 11 Mar 2024
SpeechBrain 1.0: A free and open-source AI toolkit for all things speech
1 project | news.ycombinator.com | 28 Feb 2024
A note from our sponsor - SaaSHub
www.saashub.com | 25 Apr 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source speech-to-text projects? This list will help you:

	Project	Stars
1	whisper.cpp	30,942
2	DeepSpeech	24,278
3	Leon	14,539
4	Kaldi Speech Recognition Toolkit	13,706
5	whisperX	8,965
6	faster-whisper	8,723
7	SpeechRecognition	8,040
8	speechbrain	7,869
9	vosk-api	7,025
10	annyang	6,547
11	pyvideotrans	5,556
12	silero-models	4,534
13	lingvo	2,780
14	willow	2,361
15	STT	2,131
16	whisper-diarization	1,985
17	kalliope	1,696
18	soloud	1,644
19	whisper-asr-webservice	1,617
20	whisper-timestamped	1,501
21	Dragonfire	1,372
22	dc_tts	1,150
23	awesome-whisper	989