WOLOF-ASR-Wav2Vec2
SpecVQGAN
WOLOF-ASR-Wav2Vec2 | SpecVQGAN | |
---|---|---|
2 | 2 | |
12 | 318 | |
- | - | |
0.0 | 2.2 | |
over 2 years ago | 11 months ago | |
Jupyter Notebook | Jupyter Notebook | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
WOLOF-ASR-Wav2Vec2
-
My first contribution into hugging face
I have finetune wav2vec2 large xlsr53 on WOLOF audio data set, for more info visit the here. You can also check my Github repo. You can also look at my Kaggle notebook.
- [P] Finetuning Facebook wav2vec2 large xlsr model on Wolof audio data
SpecVQGAN
-
Text-to-Audio Generation Using Instruction Tuned LLM and Latent Diffusion Model
Excellent. Some of the theory here goes back to Oct/2021 and beyond [1].
The riffusion.com [2] guys made this practical. Also, my video of high-level overview and examples [3].
1. SpecVQGAN: https://github.com/v-iashin/SpecVQGAN
2. Riffusion: ://www.riffusion.com/
3. Riffusion high-level overview: https://youtu.be/olkLVGcvib8
- "Taming Visually Guided Sound Generation". Quickly generate audio matching a given video. Code includes a Google Colab.
What are some alternatives?
awesome-deep-learning-music - List of articles related to deep learning applied to music
poolformer - PoolFormer: MetaFormer Is Actually What You Need for Vision (CVPR 2022 Oral)
awesome-python-applications - 💿 Free software that works great, and also happens to be open-source Python.
vid2cleantxt - Python API & command-line tool to easily transcribe speech-based video files into clean text
essentia - C++ library for audio and music analysis, description and synthesis, including Python bindings
MoViNet-pytorch - MoViNets PyTorch implementation: Mobile Video Networks for Efficient Video Recognition;
auto-editor - Auto-Editor: Effort free video editing!
ru-dalle - Generate images from texts. In Russian
OTTO - Sampler, Sequencer, Multi-engine synth and effects - in a box! [WIP]
beep - A little package that brings sound to any Go application. Suitable for playback and audio-processing.
BMT - Source code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020)