SpecVQGAN
WOLOF-ASR-Wav2Vec2
SpecVQGAN | WOLOF-ASR-Wav2Vec2 | |
---|---|---|
2 | 2 | |
318 | 12 | |
- | - | |
2.2 | 0.0 | |
11 months ago | over 2 years ago | |
Jupyter Notebook | Jupyter Notebook | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
SpecVQGAN
-
Text-to-Audio Generation Using Instruction Tuned LLM and Latent Diffusion Model
Excellent. Some of the theory here goes back to Oct/2021 and beyond [1].
The riffusion.com [2] guys made this practical. Also, my video of high-level overview and examples [3].
1. SpecVQGAN: https://github.com/v-iashin/SpecVQGAN
2. Riffusion: ://www.riffusion.com/
3. Riffusion high-level overview: https://youtu.be/olkLVGcvib8
- "Taming Visually Guided Sound Generation". Quickly generate audio matching a given video. Code includes a Google Colab.
WOLOF-ASR-Wav2Vec2
-
My first contribution into hugging face
I have finetune wav2vec2 large xlsr53 on WOLOF audio data set, for more info visit the here. You can also check my Github repo. You can also look at my Kaggle notebook.
- [P] Finetuning Facebook wav2vec2 large xlsr model on Wolof audio data
What are some alternatives?
poolformer - PoolFormer: MetaFormer Is Actually What You Need for Vision (CVPR 2022 Oral)
awesome-deep-learning-music - List of articles related to deep learning applied to music
vid2cleantxt - Python API & command-line tool to easily transcribe speech-based video files into clean text
awesome-python-applications - 💿 Free software that works great, and also happens to be open-source Python.
MoViNet-pytorch - MoViNets PyTorch implementation: Mobile Video Networks for Efficient Video Recognition;
essentia - C++ library for audio and music analysis, description and synthesis, including Python bindings
ru-dalle - Generate images from texts. In Russian
auto-editor - Auto-Editor: Effort free video editing!
OTTO - Sampler, Sequencer, Multi-engine synth and effects - in a box! [WIP]
BMT - Source code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020)
beep - A little package that brings sound to any Go application. Suitable for playback and audio-processing.