Our great sponsors
-
DeepSpeech
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
-
common-voice-android
Repository of "CV Project" app. It's an unofficial app for Mozilla Common Voice, which permits you to contribute to this project via your device.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
common-voice
Common Voice is part of Mozilla's initiative to help teach machines how real people speak.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
I'd like to give a shout-out to Common Voice Android: https://github.com/Sav22999/common-voice-android
It's a handy app for those interested in contributing to the project. You can record voices for the languages you speak and validate other user contributions. I used to be a frequent contributor about two years ago, and this app had a much more user-friendly design compared to the official website version.
Additionally, check out the official Common Voice Matrix channel: https://chat.mozilla.org/#/room/#common-voice:mozilla.org
> it was not at all obvious to me there was some way of speeding up getting a language in the first place.
Yeah, that's the biggest failing of Common Voice in my opinion. Getting a new language up to speed could be much improved by simply adding a few links to documentation, but even the existing links are broken, which I reported in March 2022... https://github.com/common-voice/common-voice/issues/3637
> I have no interest in wasting time contributing to a UI translation I actively don't want to be subjected to
Translating the UI may still help you get other people to record, even if you don't want to use it yourself.
> I'll see if I can submit some sentences at least
If you want to go faster, there's also a project to extract sentences from Wikipedia etc. in small doses Mozilla's lawyers and Wikimedia's lawyers have agreed are fair use. I think you'd only need to define how Norwegian Bokmål separates sentences. (E.g. after a period but not if it's a common abbreviation like "etc." in the preceding sentence.)
Related posts
- Offline speech to text software
- Web Speech API is (still) broken on Linux circa 2023
- SuperImage: Sharpen your low-resolution pictures with the power of AI upscaling
- Mozilla Common Voice - Korean Language is live - Help Build a Korean Corpus for Training AI/Navi/etc
- Ask HN: Open-source video transcribing software?