STT VS media

Compare STT vs media and see what are their differences.

STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy. (by coqui-ai)

media

Simple ffmpeg-using multimedia decoder (by FlyingRhenquest)
Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
STT media
11 1
2,131 12
2.7% -
0.6 0.0
about 2 months ago over 4 years ago
C++ C++
Mozilla Public License 2.0 Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

STT

Posts with mentions or reviews of STT. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-05-04.

media

Posts with mentions or reviews of media. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-08-30.
  • CoquiTTS: πŸΈπŸ’¬ - Open Source Text-to-Speech framework.
    3 projects | /r/programming | 30 Aug 2021
    I tinkered with it briefly in the past. I didn't get particularly good results, but did find it pretty easy to integrate into a media handling library I wrote that's primarily an C++ wrapper for ffmpeg. The unit test for the sphinx bits are here if anyone's curious. The status of the library is semi-abandoned currently, as I'm working on an updated one taking into account a bunch of stuff I learned about ffmpeg over the last several years. Still works pretty well for what it does.

What are some alternatives?

When comparing STT and media you can also consider the following projects:

DeepSpeech - DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

TTS - πŸΈπŸ’¬ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

NeMo - A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

vosk-api - Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

TTS - :robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

OBS-captions-plugin - Closed Captioning OBS plugin using Google Speech Recognition

flashlight - A C++ standalone library for machine learning

vakyansh-models - Open source speech to text models for Indic Languages

PaddleSpeech - Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

spaCy - πŸ’« Industrial-strength Natural Language Processing (NLP) in Python

LocalSTT - Android Speech Recognition Service using Vosk/Kaldi and Mozilla DeepSpeech