-
silero-models
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
I've been looking for something like this for a while. Previous best I could find was https://github.com/CorentinJ/Real-Time-Voice-Cloning but it worked quite poorly on a lot of test data I used. Can you advise on what a minimal training set might be (eg. If we used a phonetic pangram would it be sufficient?). Thanks for the effort anyway - I'll test tomorrow and feedback if I have anything to input!
The app uses the silero model (https://github.com/snakers4/silero-models) for speech-to-text which only supports English, Spanish, German & Ukrainian. This unfortunately means those are the only languages this app could support for dataset generation.
Related posts
-
[D] What's stopping you from working on speech and voice?
-
I made a free transcription service powered by Whisper AI
-
Show HN: State-of-the-Art German Speech Recognition in 284 lines of C++
-
Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old
-
Einops: Flexible and powerful tensor operations for readable and reliable code