vosk-browser
vosk-server
vosk-browser | vosk-server | |
---|---|---|
3 | 4 | |
330 | 843 | |
- | 1.8% | |
0.0 | 5.5 | |
4 months ago | 29 days ago | |
JavaScript | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
vosk-browser
-
Show HN: I record myself on audio 24x7 and use an AI to process the information
Not the OP but I've been tinkering with the same concept (24/7 processing).
'm using vosk browser: https://github.com/ccoreilly/vosk-browser
To do speech to text locally and it works very well for English.
- Speech-to-Text Client-Side?
-
On-device browser translations with Firefox Translations
I believe this is called the Bergamot project, more can be found here: https://browser.mt/
The GitHub repo for it is here: https://github.com/browsermt/bergamot-translator
The repo contains some details about how to run it in WASM which is quite interesting for embedding it in pages. I've been playing around with using WASM to capture speech to text (https://github.com/ccoreilly/vosk-browser) and automatically translating it using Bergamot.
Results have been, ok. I don't think the tech is quite there yet and the speech to text obviously struggles with multiple speakers.
vosk-server
- Self-hosted audio transcription?
-
Open Source ASR with user-specific custom vocabularies?
Through my research, the most promising real-time transcription options appear to be Vosk or Kaldi Gstreamer. I’ve set them both up & they appear to work well for general transcription, but I’m not sure how to handle the user-specific custom vocabularies.
- Voice2json: Offline speech and intent recognition on Linux
- Connecting vosk python model with react
What are some alternatives?
cheetah - On-device streaming speech-to-text engine powered by deep learning
vosk-api - Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
common-voice - Common Voice is part of Mozilla's initiative to help teach machines how real people speak.
ovos-stt-plugin-vosk - vosk STT plugin for mycroft
kaldi-gstreamer-server - Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
react-native-vosk - Speech recognition module for react native using Vosk library
TTS - 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
haven - Haven is for people who need a way to protect their personal spaces and possessions without compromising their own privacy, through an Android app and on-device sensors
julius - Open-Source Large Vocabulary Continuous Speech Recognition Engine
whisper - Robust Speech Recognition via Large-Scale Weak Supervision
vosk-android-demo - Offline speech recognition for Android with Vosk library.