How can I create a dataset to refine Whisper AI from old videos with subtitles?

This page summarizes the projects mentioned and recommended in the original post on /r/OpenAI

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • finetuner

    :dart: Task-oriented embedding tuning for BERT, CLIP, etc.

    You can try creating your own dataset. Get some audio data that you want, preprocess it, and then create a custom dataset you can use to fine tune. You could use finetuners like these if you want as well.

  • community-events

    Place where folks can contribute to 🤗 community events

    For the training, I extremely recommend checking out the Whisper Fine-Tuning Event. It has a python script to train in one command, tons of tips, even a walkthrough video.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • mimic-recording-studio

    Mimic Recording Studio is a Docker-based application you can install to record voice samples, which can then be trained into a TTS voice with Mimic2

    I weirdly can't find a great off-the-shelf app for this. l'd love to know if anyone finds one. Most stuff seems to be for recording data for Text To Speech (going the other way). Mimic Recording Studio looks the best. Then there's speech training recorder and TTS Dataset Creator (video). You don't have to worry about audio quality as much as they do.

  • speech-training-recorder

    Simple GUI application to help record audio dictated from given text prompts, for use with training speech recognition or speech synthesis.

    I weirdly can't find a great off-the-shelf app for this. l'd love to know if anyone finds one. Most stuff seems to be for recording data for Text To Speech (going the other way). Mimic Recording Studio looks the best. Then there's speech training recorder and TTS Dataset Creator (video). You don't have to worry about audio quality as much as they do.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts