Whisper

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

whisper

344 60,617 6.4 Python

Robust Speech Recognition via Large-Scale Weak Supervision

But yeah, I set up a new environment in Anaconda, and followed the instructions on their Github page to install it. I then used the "medium" model to transcribe a recent 20 minute video (日本人の英語の発音の特徴！アメリカでどう思われてるの) on the Kevin's English Room YouTube channel using YT-DLP, as it's easy to confirm the transcription given that it contains Japanese hard-coded subtitles, as most Japanese videos on YouTube do. This took about 11 minutes on a 2080 Ti, so approximately 2x real time. And I'd say the result is significantly better than the default YouTube auto transcription, especially when people are speaking in multiple languages (Pastebin link):

silero-vad

10 2,866 6.9 Python

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

And while googling this, I stumbled upon this discussion on the Whisper GitHub repository, which seems to suggest that the issue is that the current VAD (Voice Activity Detection) is quite poor, and that it can be resolved by using another VAD (like silero-vad). This might be something I want to add to my WebUI in the future.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

New models and developer products announced at OpenAI DevDay

8 projects | news.ycombinator.com | 6 Nov 2023
[P] A more detailed post about Silero VAD on The Gradient

1 project | /r/MachineLearning | 19 Feb 2022
Silero VAD: pre-trained enterprise-grade voice activity detector

1 project | news.ycombinator.com | 30 Dec 2021
[P] Silero VAD: One voice detector to rule them all

2 projects | /r/MachineLearning | 18 Dec 2021
[Discussion] Video Translation Task

2 projects | /r/MachineLearning | 13 Jul 2023

Whisper - A new free AI model from OpenAI that can transcribe Japanese (and many other languages) at up to "human level" accuracy

This page summarizes the projects mentioned and recommended in the original post on /r/LearnJapanese
voice-detection voice-recognition voice-commands language-classifier
Post date: 22 Sep 2022

silero-vad

InfluxDB

Related posts

New models and developer products announced at OpenAI DevDay

[P] A more detailed post about Silero VAD on The Gradient

Silero VAD: pre-trained enterprise-grade voice activity detector

[P] Silero VAD: One voice detector to rule them all

[Discussion] Video Translation Task