vosk-browser
react-transcript-editor
vosk-browser | react-transcript-editor | |
---|---|---|
3 | 1 | |
330 | 535 | |
- | 0.6% | |
0.0 | 0.0 | |
4 months ago | 3 months ago | |
JavaScript | JavaScript | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
vosk-browser
-
Show HN: I record myself on audio 24x7 and use an AI to process the information
Not the OP but I've been tinkering with the same concept (24/7 processing).
'm using vosk browser: https://github.com/ccoreilly/vosk-browser
To do speech to text locally and it works very well for English.
- Speech-to-Text Client-Side?
-
On-device browser translations with Firefox Translations
I believe this is called the Bergamot project, more can be found here: https://browser.mt/
The GitHub repo for it is here: https://github.com/browsermt/bergamot-translator
The repo contains some details about how to run it in WASM which is quite interesting for embedding it in pages. I've been playing around with using WASM to capture speech to text (https://github.com/ccoreilly/vosk-browser) and automatically translating it using Bergamot.
Results have been, ok. I don't think the tech is quite there yet and the speech to text obviously struggles with multiple speakers.
react-transcript-editor
-
Launch HN: Milk Video (YC W21) – Edit online event recordings quickly
Thanks for signing up and trying it out.
We actually drive people to use Descript for most use cases, that aren't relevant.
Since speech-to-text APIs have become really good (props to companies like AssemblyAI (https://www.assemblyai.com/), the transcript-based interfaces are going to become much more common.
Our product goal is to solve the use case around making the visual output, when editing/correction isn't the goal. That being said, the editor should be performant and work well, so lots to improve there.
As an aside, there are a few evolving open-source libraries that consume the output of these STT services (https://github.com/bbc/react-transcript-editor) and make turnkey transcript interfaces.
The newest/most developed one I like is based on Slate, and made by a really amazing engineer at the Wall Street Journal named Pietro.
Link: https://github.com/pietrop/slate-transcript-editor
What are some alternatives?
cheetah - On-device streaming speech-to-text engine powered by deep learning
React - The library for web and native user interfaces.
vosk-api - Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
gecko - Gecko - A Tool for Effective Annotation of Human Conversations
vosk-server - WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
freeCodeCamp - freeCodeCamp.org's open-source codebase and curriculum. Learn to code for free.
ovos-stt-plugin-vosk - vosk STT plugin for mycroft
create-react-app - Set up a modern web app by running one command.
react-native-vosk - Speech recognition module for react native using Vosk library
realtime-transcription-playground - A real-time transcription project using React and socketio
haven - Haven is for people who need a way to protect their personal spaces and possessions without compromising their own privacy, through an Android app and on-device sensors
glaemscribe - Glaemscribe, the tolkienian languages/writings transcription engine.