I made a free transcription service powered by Whisper AI

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

whisper

343 60,303 6.4 Python

Robust Speech Recognition via Large-Scale Weak Supervision

It's not as if people aren't trying to do that: https://github.com/openai/whisper/discussions/264
I tried out this notebook about a month ago, and it was rough. After spending an evening improving it, I got everything "working", but pyannote was not reliable. I tried it against an hour-ish audio sample, and I found no way to tune pyannote to keep track of ~10 speakers over the course of that audio. It would identify some of the earlier speakers, but then it felt like it lost attention and would just start labeling every new speaker as the same speaker. There is an option to force the minimum number of speakers higher, and that just caused it to split some of the earlier speakers into multiple labels. It did nothing to address the latter half of the audio.
So, sure, someone should continue working on putting the pieces together, but I think pyannote itself needs some improvement.
Beyond that, I think using separate models for transcription and diarization ends up being really clunky. If you have a podcast-like environment where people get excited and start talking over each other, then even if pyannote correctly identifies all of the speakers during the overlapping segments and when they spoke... Whisper cannot be used to separate speakers. You end up with either duplicate transcripts attributed to everyone involved, or something worse. Impressively, I have seen pyannote do exactly that.

whisper.cpp

187 30,942 9.8 C

Port of OpenAI's Whisper model in C/C++

This is a cool project. I’ve been very happy with whisper as an alternative to otter; it works better and solves real problems for me.
I feel compelled to point out whisper.cpp. It may be cheaper for the author but is relevant for others.
I was running whisper on a gtx 1070 to get decent performance; it was terribly slow on M1 Mac. Whisper.cpp has comparable performance to the 1070 while running on M1 CPU. It is easy to build and run and well documented.
https://github.com/ggerganov/whisper.cpp
I hope this doesn’t come off the wrong way, I love this project and I’m glad to see the technology democratized. Easily accessible high-quality transcription will be a game changer for many people and organizations.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
generate-subtitles

32 671 0.0 JavaScript

Generate transcripts for audio and video content with a user friendly UI, powered by Open AI's Whisper with automatic translations and download videos automatically with yt-dlp integration

I'm just running this off of a 2x RTX A6000 server on Vast.ai at the moment, about $1.30/h and then using nginx on another server to reverse proxy it to Vast
Open an issue on the Github repo and we can collab for sure!: https://github.com/mayeaux/generate-subtitles/issues

pyannote-audio

15 5,027 8.6 Jupyter Notebook

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Free startup idea: Use Whisper with pyannote-audio[0]’s speaker diarization. Upload a recording, get back a multi-speaker annotated transcription.
Make a JSON API and I’ll be your first customer.
[0] https://github.com/pyannote/pyannote-audio

whisper-asr-webservice

11 1,617 7.7 Python

OpenAI Whisper ASR Webservice API

I think there's been talk to do speaker diarization with whisper-asr-webservice[0] which is also written in python and should be able to make use of goodies such as pyannote-audio, py-webrtcvad, etc.
Whisper is great but at the point we get to kludging various things together it starts to make more sense to use something like Nvidia NeMo[1] which was built with all of this in mind and more
[0] - https://github.com/ahmetoner/whisper-asr-webservice
[1] - https://github.com/NVIDIA/NeMo

NeMo

29 10,021 9.8 Python

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

I think there's been talk to do speaker diarization with whisper-asr-webservice[0] which is also written in python and should be able to make use of goodies such as pyannote-audio, py-webrtcvad, etc.
Whisper is great but at the point we get to kludging various things together it starts to make more sense to use something like Nvidia NeMo[1] which was built with all of this in mind and more
[0] - https://github.com/ahmetoner/whisper-asr-webservice
[1] - https://github.com/NVIDIA/NeMo

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

[D] What's stopping you from working on speech and voice?
7 projects | /r/MachineLearning | 30 Jan 2023
Voice Cloning App
2 projects | /r/Python | 7 Apr 2021
Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old
1 project | news.ycombinator.com | 28 Feb 2024
Now I Can Just Print That Video
5 projects | news.ycombinator.com | 4 Dec 2023
Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller
14 projects | news.ycombinator.com | 31 Oct 2023

I made a free transcription service powered by Whisper AI

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
speech-to-text Deep Learning Pytorch openai speech-recognition
Post date: 18 Nov 2022

whisper

whisper.cpp

WorkOS

generate-subtitles

pyannote-audio

whisper-asr-webservice

NeMo

Related posts

I made a free transcription service powered by Whisper AI

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com speech-to-text Deep Learning Pytorch openai speech-recognition Post date: 18 Nov 2022

whisper

whisper.cpp

WorkOS

generate-subtitles

pyannote-audio

whisper-asr-webservice

NeMo

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
speech-to-text Deep Learning Pytorch openai speech-recognition
Post date: 18 Nov 2022