How can I create a dataset to refine Whisper AI from old videos with subtitles?

This page summarizes the projects mentioned and recommended in the original post on /r/OpenAI

Our great sponsors
  • Sonar - Write Clean Python Code. Always.
  • Mergify - Updating dependencies is time-consuming.
  • InfluxDB - Collect and Analyze Billions of Data Points in Real Time
  • finetuner

    :dart: Task-oriented embedding tuning for BERT, CLIP, etc.

    You can try creating your own dataset. Get some audio data that you want, preprocess it, and then create a custom dataset you can use to fine tune. You could use finetuners like these if you want as well.

  • community-events

    Place where folks can contribute to 🤗 community events

    For the training, I extremely recommend checking out the Whisper Fine-Tuning Event. It has a python script to train in one command, tons of tips, even a walkthrough video.

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • mimic-recording-studio

    Mimic Recording Studio is a Docker-based application you can install to record voice samples, which can then be trained into a TTS voice with Mimic2

    I weirdly can't find a great off-the-shelf app for this. l'd love to know if anyone finds one. Most stuff seems to be for recording data for Text To Speech (going the other way). Mimic Recording Studio looks the best. Then there's speech training recorder and TTS Dataset Creator (video). You don't have to worry about audio quality as much as they do.

  • speech-training-recorder

    Simple GUI application to help record audio dictated from given text prompts, for use with training speech recognition or speech synthesis.

    I weirdly can't find a great off-the-shelf app for this. l'd love to know if anyone finds one. Most stuff seems to be for recording data for Text To Speech (going the other way). Mimic Recording Studio looks the best. Then there's speech training recorder and TTS Dataset Creator (video). You don't have to worry about audio quality as much as they do.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts