whisper
NotificationBanner
Our great sponsors
- InfluxDB - Access the most powerful time series database as a service
- Sonar - Write Clean Python Code. Always.
- ONLYOFFICE ONLYOFFICE Docs — document collaboration in your environment
whisper | NotificationBanner | |
---|---|---|
289 | 0 | |
37,915 | 4,586 | |
18.6% | - | |
9.2 | 0.0 | |
2 days ago | about 1 month ago | |
Python | Swift | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
whisper
- Ask HN: Which project(s) made you go “I can't believe this is open-source”?
- Voice recognition software for German
- Do you know any API for Filipino/English (taglish) speech to text
-
Mahiru's wholesome lines
It's captioning that's the most time consuming, you have to caption what she says in every audio clip. Luckily Whisper can do most of the heavy lifting but you still have to manually check them all. The captions look like this which you then feed into this neural network alongside the audio samples.
-
Bazarr AI subs
Check https://github.com/openai/whisper & https://github.com/ahmetoner/whisper-asr-webservice
- Is there a plugin for transcription work?
-
Would you guys like this project?
I'm planning on doing a project based on Whisper, https://github.com/openai/whisper. It's basically a model that can be used for speech transcription, i.e., you upload a file and then you get the transcribed speech. Most people by far, however, aren't tech savvy enough to clone this repo and set it up to do it themselves. I'm thinking of making a site (free of charge of course), to offer people to use it through, and then the best model, the one with 769M parameters. It requires a lot of memory so most people don't even have the capability to run such a model, that's why I think it'll be useful, as I'll be using distributed servers to offer this.
- Archiving radio stations globally and transcribe them with AI - thourts?
- Rest in Peas: The Unrecognized Death of Speech Recognition (2010)
-
Using OpenAI's Whisper for Discord Voice Channel Transcription and Summarization
OpenAI recently released Whisper, its new amazing speech-to-text tool. Given GPT, I thought I'd see if I could connect some pipes, and boy, am I pleased with the results.
NotificationBanner
We haven't tracked posts mentioning NotificationBanner yet.
Tracking mentions began in Dec 2020.
What are some alternatives?
vosk-api - Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
silero-vad - Silero VAD: pre-trained enterprise-grade Voice Activity Detector
whisper.cpp - Port of OpenAI's Whisper model in C/C++
NeMo - NeMo: a toolkit for conversational AI
buzz - Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
text-generation-webui - A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA.
SwiftMessages - A very flexible message bar for iOS written in Swift.
Whisper - :mega: Whisper is a component that will make the task of display messages and in-app notifications simple. It has three different views inside
openai-whisper-realtime - A quick experiment to achieve almost realtime transcription using Whisper.
TensorRT - NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.
FCAlertView - FCAlertView is a Flat Customizable AlertView for iOS (Swift)
stable-diffusion - A latent text-to-image diffusion model