SpecVQGAN
poolformer
SpecVQGAN | poolformer | |
---|---|---|
2 | 3 | |
318 | 1,226 | |
- | 0.0% | |
2.2 | 0.0 | |
11 months ago | over 1 year ago | |
Jupyter Notebook | Jupyter Notebook | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
SpecVQGAN
-
Text-to-Audio Generation Using Instruction Tuned LLM and Latent Diffusion Model
Excellent. Some of the theory here goes back to Oct/2021 and beyond [1].
The riffusion.com [2] guys made this practical. Also, my video of high-level overview and examples [3].
1. SpecVQGAN: https://github.com/v-iashin/SpecVQGAN
2. Riffusion: ://www.riffusion.com/
3. Riffusion high-level overview: https://youtu.be/olkLVGcvib8
- "Taming Visually Guided Sound Generation". Quickly generate audio matching a given video. Code includes a Google Colab.
poolformer
-
Researchers from Sea AI Lab and National University of Singapore Introduce ‘PoolFormer’: A Derived Model from MetaFormer for Computer Vision Tasks
GitHub: https://github.com/sail-sg/poolformer
-
[D] Are Image Transformers Overhyped? "MetaFormer is all you need" explained (5-minute summary by Casual GAN Papers)
arxiv / code
-
[P] Fine-tuning the new PoolFormer (MetaFormer) model on a Kaggle Competitions Dataset
Code for https://arxiv.org/abs/2111.11418 found: https://github.com/sail-sg/poolformer
What are some alternatives?
vid2cleantxt - Python API & command-line tool to easily transcribe speech-based video files into clean text
pytorch-seq2seq - Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
MoViNet-pytorch - MoViNets PyTorch implementation: Mobile Video Networks for Efficient Video Recognition;
HugsVision - HugsVision is a easy to use huggingface wrapper for state-of-the-art computer vision
ru-dalle - Generate images from texts. In Russian
TFLiteClassification - TensorFlow Lite Image Classification Python Implementation
awesome-python-applications - 💿 Free software that works great, and also happens to be open-source Python.
FunMatch-Distillation - TF2 implementation of knowledge distillation using the "function matching" hypothesis from https://arxiv.org/abs/2106.05237.
BMT - Source code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020)
Transformer-in-Transformer - An Implementation of Transformer in Transformer in TensorFlow for image classification, attention inside local patches
WOLOF-ASR-Wav2Vec2 - Audio Preprocessing and finetuning of wav2vec2-large-xlsr model on AI4D Baamtu Datamation - Automatic Speech Recognition in WOLOF Data.