Whisper - A new free AI model from OpenAI that can transcribe Japanese (and many other languages) at up to "human level" accuracy

This page summarizes the projects mentioned and recommended in the original post on /r/LearnJapanese

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

  • But yeah, I set up a new environment in Anaconda, and followed the instructions on their Github page to install it. I then used the "medium" model to transcribe a recent 20 minute video (日本人の英語の発音の特徴!アメリカでどう思われてるの) on the Kevin's English Room YouTube channel using YT-DLP, as it's easy to confirm the transcription given that it contains Japanese hard-coded subtitles, as most Japanese videos on YouTube do. This took about 11 minutes on a 2080 Ti, so approximately 2x real time. And I'd say the result is significantly better than the default YouTube auto transcription, especially when people are speaking in multiple languages (Pastebin link):

  • silero-vad

    Silero VAD: pre-trained enterprise-grade Voice Activity Detector

  • And while googling this, I stumbled upon this discussion on the Whisper GitHub repository, which seems to suggest that the issue is that the current VAD (Voice Activity Detection) is quite poor, and that it can be resolved by using another VAD (like silero-vad). This might be something I want to add to my WebUI in the future.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • New models and developer products announced at OpenAI DevDay

    8 projects | news.ycombinator.com | 6 Nov 2023
  • [P] A more detailed post about Silero VAD on The Gradient

    1 project | /r/MachineLearning | 19 Feb 2022
  • Silero VAD: pre-trained enterprise-grade voice activity detector

    1 project | news.ycombinator.com | 30 Dec 2021
  • [P] Silero VAD: One voice detector to rule them all

    2 projects | /r/MachineLearning | 18 Dec 2021
  • [Discussion] Video Translation Task

    2 projects | /r/MachineLearning | 13 Jul 2023