How can I create a dataset to refine Whisper AI from old videos with subtitles?

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

finetuner

0 1,414 5.5 Python

:dart: Task-oriented embedding tuning for BERT, CLIP, etc.

You can try creating your own dataset. Get some audio data that you want, preprocess it, and then create a custom dataset you can use to fine tune. You could use finetuners like these if you want as well.
community-events

0 375 7.2 Jupyter Notebook

Place where folks can contribute to 🤗 community events

For the training, I extremely recommend checking out the Whisper Fine-Tuning Event. It has a python script to train in one command, tons of tips, even a walkthrough video.
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
mimic-recording-studio

0 485 0.0 JavaScript

Mimic Recording Studio is a Docker-based application you can install to record voice samples, which can then be trained into a TTS voice with Mimic2

I weirdly can't find a great off-the-shelf app for this. l'd love to know if anyone finds one. Most stuff seems to be for recording data for Text To Speech (going the other way). Mimic Recording Studio looks the best. Then there's speech training recorder and TTS Dataset Creator (video). You don't have to worry about audio quality as much as they do.
speech-training-recorder

0 35 10.0 Python

Simple GUI application to help record audio dictated from given text prompts, for use with training speech recognition or speech synthesis.

I weirdly can't find a great off-the-shelf app for this. l'd love to know if anyone finds one. Most stuff seems to be for recording data for Text To Speech (going the other way). Mimic Recording Studio looks the best. Then there's speech training recorder and TTS Dataset Creator (video). You don't have to worry about audio quality as much as they do.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project