flashlight
STT
Our great sponsors
flashlight | STT | |
---|---|---|
16 | 11 | |
5,116 | 2,103 | |
1.0% | 2.3% | |
7.7 | 0.6 | |
10 days ago | 18 days ago | |
C++ | C++ | |
MIT License | Mozilla Public License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
flashlight
-
MatX: Efficient C++17 GPU numerical computing library with Python-like syntax
I think a comparison to PyTorch, TensorFlow and/or JAX is more relevant than a comparison to CuPy/NumPy.
And then maybe also a comparison to Flashlight (https://github.com/flashlight/flashlight) or other C/C++ based ML/computing libraries?
Also, there is no mention of it, so I suppose this does not support automatic differentiation?
-
Project Resources
This Facebook ai project seems reasonably structured after looking at its CMakeLists.txt. CMake is a build generator for c++, it's how you make binaries to run your project: https://github.com/flashlight/flashlight
-
[D] Deep Learning Framework for C++.
I built and maintain Flashlight, a C++-first library for ML/DL. We built Flashlight to be:
-
What is the most used library for AI in C++ ?
I’ve never used it, but Facebook’s flashlight looks interesting
-
Mozilla Common Voice Adds 16 New Languages and 4,600 New Hours of Speech
I've had good results with https://github.com/flashlight/flashlight/blob/master/flashli.... Seems to work well with spoken english in a variety of accents. Biggest limitation is that the architecture they have pretrained models for doesn't really work well with clips longer than ~15 seconds, so you have to segment your input files.
STT
-
Rest in Peas: The Unrecognized Death of Speech Recognition (2010)
What has happened since then? I know Common Voice has come and gone https://en.wikipedia.org/wiki/Common_Voice https://github.com/coqui-ai/STT
And I've seen some neural approaches too
No idea where to look for comparisons though.
-
Numen - FOSS voice control for handsfree computing
I basically just used coqui stt https://github.com/coqui-ai/STT
-
Are there any OCR and Speech-to-Text services that are privacy friendly?
This speech-to-text works well: https://github.com/coqui-ai/STT. openai's "whisper" is probably better but I haven't tried it: https://towardsdatascience.com/transcribe-audio-files-with-openais-whisper-e973ae348aa7
-
Introducing Whisper
I use two SST to live-translate audio that I listen to so I can look back (in paragraph form) to see things that I or the youtube has previously said: https://github.com/coqui-ai/STT https://github.com/ratwithacompiler/OBS-captions-plugin
-
You can now tether any prod Vector to Wire's Open Source Escape Pod • thedroidyouarelookingfor
I did have to install Coqui STT and go-asticoqui manually before i was able to run Chipper.
-
I put together a tutorial and overview on how to use DeepSpeech to do Speech Recognition in Python
If anyone is looking for a maintained version of DeepSpeech, checkout Coqui's repositories for STT and TTS. Coqui is lead by the engineers that used to work on DeepSpeech at Mozilla.
-
CoquiTTS: 🐸💬 - Open Source Text-to-Speech framework.
Link: https://github.com/coqui-ai/STT
- Mozilla Common Voice Adds 16 New Languages and 4,600 New Hours of Speech
- Coqui, a startup providing open speech tech for everyone
What are some alternatives?
DeepSpeech - DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
TTS - 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
NeMo - NeMo: a framework for generative AI
vosk-api - Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
TTS - :robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
OBS-captions-plugin - Closed Captioning OBS plugin using Google Speech Recognition
PaddleSpeech - Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
vakyansh-models - Open source speech to text models for Indic Languages