speech-to-text

Top 23 speech-to-text Open-Source Projects

  • whisper.cpp

    Port of OpenAI's Whisper model in C/C++

  • Project mention: Show HN: I created automatic subtitling app to boost short videos | news.ycombinator.com | 2024-04-09

    whisper.cpp [1] has a karaoke example that uses ffmpeg's drawtext filter to display rudimentary karaoke-like captions. It also supports diarisation. Perhaps it could be a starting point to create a better script that does what you need.

    --

    1: https://github.com/ggerganov/whisper.cpp/blob/master/README....

  • DeepSpeech

    DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

  • Project mention: Common Voice | news.ycombinator.com | 2023-12-05
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Leon

    🧠 Leon is your open-source personal assistant.

  • Project mention: Rabbit R1, Designed by Teenage Engineering | news.ycombinator.com | 2024-01-09

    It's indeed suspicious. You're sending your voice samples, your various services accounts, your location and more private data to some proprietary black box in some public cloud. Sorry, but this is a privacy nightmare. It should be open source and self-hosted like Mycroft (https://mycroft.ai) or Leon (https://getleon.ai) to be trustworthy.

  • Kaldi Speech Recognition Toolkit

    kaldi-asr/kaldi is the official location of the Kaldi project.

  • Project mention: Amazon plans to charge for Alexa in June–unless internal conflict delays revamp | news.ycombinator.com | 2024-01-20

    Yeah, whisper is the closest thing we have, but even it requires more processing power than is present in most of these edge devices in order to feel smooth. I've started a voice interface project on a Raspberry Pi 4, and it takes about 3 seconds to produce a result. That's impressive, but not fast enough for Alexa.

    From what I gather a Pi 5 can do it in 1.5 seconds, which is closer, so I suspect it's only a matter of time before we do have fully local STT running directly on speakers.

    > Probably anathema to the space, but if the devices leaned into the ~five tasks people use them for (timers, weather, todo list?) could probably tighten up the AI models to be more accurate and/or resource efficient.

    Yes, this is the approach taken by a lot of streaming STT systems, like Kaldi [0]. Rather than use a fully capable model, you train a specialized one that knows what kinds of things people are likely to say to it.

    [0] http://kaldi-asr.org/

  • whisperX

    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

  • Project mention: Easy video transcription and subtitling with Whisper, FFmpeg, and Python | news.ycombinator.com | 2024-04-06

    It uses this, which does support diarization: https://github.com/m-bain/whisperX

  • faster-whisper

    Faster Whisper transcription with CTranslate2

  • Project mention: Using Groq to Build a Real-Time Language Translation App | dev.to | 2024-04-05

    For our real-time STT needs, we'll employ a fantastic library called faster-whisper.

  • SpeechRecognition

    Speech recognition module for Python, supporting several engines and APIs, online and offline.

  • Project mention: help with script (beginner) | /r/learnpython | 2023-12-07

    Start and Stop Listening Example

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • speechbrain

    A PyTorch-based Speech Toolkit

  • Project mention: SpeechBrain 1.0: A free and open-source AI toolkit for all things speech | news.ycombinator.com | 2024-02-28
  • vosk-api

    Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

  • Project mention: VOSK Offline Speech Recognition API | news.ycombinator.com | 2024-04-13
  • annyang

    :speech_balloon: Speech recognition for your site

  • pyvideotrans

    Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音

  • Project mention: FLaNK Stack Weekly 06 Nov 2023 | dev.to | 2023-11-06
  • silero-models

    Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

  • Project mention: Weird A.I. Yankovic, a cursed deep dive into the world of voice cloning | news.ycombinator.com | 2023-10-02

    I doubt it's currently actually "the best open source text to speech", but the answer I came up with when throwing a couple of hours at the problem some months ago was "Silero" [0, 1].

    Following the "standalone" guide [2], it was pretty trivial to make the model render my sample text in about 100 English "voices" (many of which were similar to each other, and in varying quality). Sampling those, I got about 10 that were pretty "good". And maybe 6 that were the "best ones" (pretty natural, not annoying to listen to).

    IIRC the license was free for noncommercial use only. I'm not sure exactly "how open source" they are, but it was simple to install the dependencies and write the basic Python to try it out; I had to write a for loop to try all the voices like I wanted. I ended using something else for the project for other reasons, but this could still be fairly good backup option for some use cases IMO.

      [0] https://github.com/snakers4/silero-models#text-to-speech

  • lingvo

    Lingvo

  • willow

    Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative

  • Project mention: ESPHome | news.ycombinator.com | 2024-04-23

    Fair points but with all due respect completely misses the point and context. My comment was a reply to a new user interested in esphome on a post about esphome.

    You're talking about CircuitPython, 35KB web replies, PSRAM, UF2 bootloader, etc. These are comparatively very advanced topics and you didn't mention esphome once.

    The comfort and familiarity of Amazon for what is already a new, intimidating, and challenging subject is of immeasurable value for a novice. They can click those links, fill a cart, and have stuff show up tomorrow with all of the usual ease, friendliness, and reliability of Amazon. If they get frustrated or it doesn't work out they can shove it in the box and get a full refund Amazon-style.

    You're suggesting wandering all over the internet, ordering stuff from China, multiple vendors, etc while describing a bunch of things that frankly just won't matter to them. I say this as someone who has been an esphome and home assistant user since day one. The approach I described has never failed or remotely bothered me and over the past ~decade I've seen it suggested to new users successfully time and time again.

    In terms of PSRAM to my knowledge the only thing it is utilized for in the esphome ecosystem is higher resolution displays and more advanced voice assistant scenarios that almost always require -S3 anyway and are a very advanced, challenging use cases. I'm very familiar with displays, voice, the S3, and PSRAM but more on that in a second...

    > live with one less LX7 core and no Bluetooth

    I'm the founder of Willow[0] and when comparing Willow to esphome the most frequent request we get is supporting bluetooth functionality i.e. esphome bluetooth proxy[1]. This is an extremely popular use case in the esphome/home assistant community. Not having bluetooth while losing a core and paying more is a bigger issue than pin spacing.

    It's also a pretty obscure board and while not a big deal to you and I if you look around at docs, guides, etc, etc you'll see the cheap-o boards from Amazon are by far the most popular and common (unsurprisingly). Another plus for a new user.

    Speaking of Willow (and back to PSRAM again) even the voice assistant satellite functionality of Home Assistant doesn't fundamentally require it - the most popular device doesn't have it either[2].

    Very valuable comment with a lot of interesting information, just doesn't apply to context.

    [0] - https://heywillow.io/

    [1] - https://esphome.io/components/bluetooth_proxy.html

    [2] - https://www.home-assistant.io/voice_control/thirteen-usd-voi...

  • STT

    🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

  • Project mention: Rest in Peas: The Unrecognized Death of Speech Recognition (2010) | news.ycombinator.com | 2023-05-04

    What has happened since then? I know Common Voice has come and gone https://en.wikipedia.org/wiki/Common_Voice https://github.com/coqui-ai/STT

    And I've seen some neural approaches too

    No idea where to look for comparisons though.

  • whisper-diarization

    Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

  • Project mention: MacWhisper: Transcribe audio files on your Mac | news.ycombinator.com | 2023-08-23

    https://github.com/MahmoudAshraf97/whisper-diarization

    This project has been alright for transcribing audio with speaker diarization. A big finicky. The OpenAI model is better than other paid products(Descript, Riverside) so I’m looking forward to trying MacWhisper.

  • kalliope

    Kalliope is a framework that will help you to create your own personal assistant.

  • soloud

    Free, easy, portable audio engine for games

  • whisper-asr-webservice

    OpenAI Whisper ASR Webservice API

  • Project mention: How I converted a podcast into a knowledge base using Orama search and OpenAI whisper and Astro | dev.to | 2023-05-23
  • whisper-timestamped

    Multilingual Automatic Speech Recognition with word-level timestamps and confidence

  • Project mention: Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old | news.ycombinator.com | 2024-02-28

    Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]

    [0] https://github.com/linto-ai/whisper-timestamped

  • Dragonfire

    the open-source virtual assistant for Ubuntu based Linux distributions

  • dc_tts

    A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

  • awesome-whisper

    🔊 Awesome list for Whisper — an open-source AI-powered speech recognition system developed by OpenAI

  • Project mention: Whisper as a PUSH to STT to Clipboard solution? | /r/OpenAI | 2023-08-26
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

speech-to-text related posts

Index

What are some of the best open-source speech-to-text projects? This list will help you:

Project Stars
1 whisper.cpp 30,942
2 DeepSpeech 24,278
3 Leon 14,539
4 Kaldi Speech Recognition Toolkit 13,706
5 whisperX 8,965
6 faster-whisper 8,723
7 SpeechRecognition 8,040
8 speechbrain 7,869
9 vosk-api 7,025
10 annyang 6,547
11 pyvideotrans 5,556
12 silero-models 4,534
13 lingvo 2,780
14 willow 2,361
15 STT 2,131
16 whisper-diarization 1,985
17 kalliope 1,696
18 soloud 1,644
19 whisper-asr-webservice 1,617
20 whisper-timestamped 1,501
21 Dragonfire 1,372
22 dc_tts 1,150
23 awesome-whisper 989

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com