SpecVQGAN
nn
SpecVQGAN | nn | |
---|---|---|
2 | 26 | |
318 | 48,430 | |
- | 4.5% | |
2.2 | 7.7 | |
11 months ago | about 1 month ago | |
Jupyter Notebook | Jupyter Notebook | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
SpecVQGAN
-
Text-to-Audio Generation Using Instruction Tuned LLM and Latent Diffusion Model
Excellent. Some of the theory here goes back to Oct/2021 and beyond [1].
The riffusion.com [2] guys made this practical. Also, my video of high-level overview and examples [3].
1. SpecVQGAN: https://github.com/v-iashin/SpecVQGAN
2. Riffusion: ://www.riffusion.com/
3. Riffusion high-level overview: https://youtu.be/olkLVGcvib8
- "Taming Visually Guided Sound Generation". Quickly generate audio matching a given video. Code includes a Google Colab.
nn
-
Can't remember name of website that has explanations side-by-side with code
Hey are you talking about https://nn.labml.ai/ ?
- [D] Recent ML papers to implement from scratch
-
[P] GPT-NeoX inference with LLM.int8() on 24GB GPU
Implementation & LM Eval Harness Results
-
[P] Fine-tuned the GPT-Neox Model to Generate Quotes
Github: https://github.com/labmlai/annotated_deep_learning_paper_implementations/tree/master/labml_nn/neox
-
Best resources to learn recent transformer papers and stay updated [D]
Regarding implementations this helps me: https://nn.labml.ai/
- Introductory papers to implement
- How to convert research papers to code?
-
[D] How to convert papers to code?
Dunno if this is directly helpful, but this website has implementation with the math side by side https://nn.labml.ai/
- [D] Looking for open source projects to contribute
- Resource for papers explanation
What are some alternatives?
poolformer - PoolFormer: MetaFormer Is Actually What You Need for Vision (CVPR 2022 Oral)
GFPGAN-for-Video-SR - A colab notebook for video super resolution using GFPGAN
vid2cleantxt - Python API & command-line tool to easily transcribe speech-based video files into clean text
labml - 🔎 Monitor deep learning model training and hardware usage from your mobile phone 📱
MoViNet-pytorch - MoViNets PyTorch implementation: Mobile Video Networks for Efficient Video Recognition;
functorch - functorch is JAX-like composable function transforms for PyTorch.
ru-dalle - Generate images from texts. In Russian
ZoeDepth - Metric depth estimation from a single image
awesome-python-applications - 💿 Free software that works great, and also happens to be open-source Python.
onnx-simplifier - Simplify your onnx model
BMT - Source code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020)
Basic-UI-for-GPT-J-6B-with-low-vram - A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.