denoising-diffusion-pytorch
jukebox
Our great sponsors
denoising-diffusion-pytorch | jukebox | |
---|---|---|
11 | 129 | |
6,994 | 7,563 | |
- | 1.8% | |
8.6 | 0.0 | |
14 days ago | about 2 months ago | |
Python | Python | |
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
denoising-diffusion-pytorch
- Commits · lucidrains/denoising-diffusion-pytorch
-
Help using torchaudio and spectrograms for diffusion
I’m trying to train a diffusion model using this code (https://github.com/lucidrains/denoising-diffusion-pytorch). My idea is to take a short audio segment, transform it into a spectrogram and train the model on these images then have it generate spectrograms then go back to audio. However the model requires square images. I cannot for the life of me figure out how to make a square spectrogram. Also is a regular spectrogram or a mel spectrogram better for this application?
-
Implementation of Google's MusicLM in PyTorch
Generally it's without weights, but MusicLM is also a WIP more mature implementations have descriptions on how to train them and follow ups on small scale/crowd-sourced experiments & research[1].
[1]: https://github.com/lucidrains/denoising-diffusion-pytorch
-
[D] Time Embedding in Diffusion Model
[1] https://colab.research.google.com/drive/1sjy9odlSSy0RBVgMTgP7s99NXsqglsUL?usp=sharing#scrollTo=KOYPSxPf_LL7 [2] https://github.com/lucidrains/denoising-diffusion-pytorch/blob/main/denoising_diffusion_pytorch/denoising_diffusion_pytorch.py
-
[D] Can a Diffusion Model be trained with an NVIDIA TITAN X?
Sure. I am using: https://github.com/lucidrains/denoising-diffusion-pytorch
-
[D] Resources to learn and fully understand Diffusion Model Codes
Lucidrains GitHub is always my go to repo for understandable paper implementations https://github.com/lucidrains/denoising-diffusion-pytorch
-
Diffusion model generated exactly the same image as the training image
Thanks for the reply. Is there any suggestion if I wanted to train a model to generate half cat and half butterfly images what I should do? I git cloned the code from https://github.com/lucidrains/denoising-diffusion-pytorch and trained from scratch.
-
[D] Best diffusion model archetype to train?
DDIM/DDPM are the same model to train, they only differ at inference time. To start I would recommend building from lucidrains' MIT licenced version (https://github.com/lucidrains/denoising-diffusion-pytorch). Just play around with the models until you gain an intuition.
-
We just release a complete open-source solution for accelerating Stable Diffusion pretraining and fine-tuning!
Our codebase for the diffusion models builds heavily on OpenAI's ADM codebase , lucidrains, Stable Diffusion, Lightning and Hugging Face. Thanks for open-sourcing!
-
[D] Introduction to Diffusion Models
Once you understand these papers you can begin to understand Palette, and from there I would start with an open-source diffusion implementation like this one and then modify it to suit your needs!
jukebox
-
Open Source Libraries
openai/jukebox: Music Generation
- Will AI be able to create similar sounding music based off input?
-
Best model for music generation?
https://github.com/openai/jukebox The demo code is there.
-
Why didn't OpenAI MIT license Jukebox the same way they did CLIP?
I didn't even know about it until I heard Sam Altman casually mention it in an interview, I was expecting some basic tunes generator, but this is so amazing! I mean yeah the voices are not clear, it's muffled, but look at how far have image models progressed, if you applied the same amount of collaborative effort here, the results could be amazing! ElevenLabs showed how good and clear can AI-created voices sound. The only reason I can think of is that the Jukebox code is under view license only.
-
[R] [N] Noise2Music - Diffusion models for generating high quality music audio from text prompts, by Google Research
OpenAI had this figured out 3 years ago: https://openai.com/blog/jukebox/ . You could then even define your own text. Model is open source too.
-
Is music next?
They've had jukebox for a few years now, so I'm sure some new model will get released and explode overnight, like what chatGPT did.
-
Mongolian Gabba Goat Techno
That already exists
- El éxito continuo de OpenAI: Y como llegaron a crear la IA más avanzada del 2023. ChatGPT.
-
Implementation of Google's MusicLM in PyTorch
This model is designed to output raw audio.
However, there are many models which do output midi. That's actually much simpler, and has been done already a few years ago.
I thought OpenAI did this. But then, I might misremember, because their Jukebox actually also seems to produce raw audio (https://openai.com/blog/jukebox/).
However, midi generation is so easy, you even find it in some tutorials: https://www.tensorflow.org/tutorials/audio/music_generation
- FREE AI THINGS
What are some alternatives?
ALAE - [CVPR2020] Adversarial Latent Autoencoders
lucid-sonic-dreams
autoregressive - :kiwi_fruit: Autoregressive Models in PyTorch.
ultimatevocalremovergui - GUI for a Vocal Remover that uses Deep Neural Networks.
stylegan2-pytorch - Simplest working implementation of Stylegan2, state of the art generative adversarial network, in Pytorch. Enabling everyone to experience disentanglement
spleeter - Deezer source separation library including pretrained models.
Awesome-Diffusion-Models - A collection of resources and papers on Diffusion Models
music-demixing-challenge-starter-kit - Starter kit for getting started in the Music Demixing Challenge.
RAVE - Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder
dalle-mini - DALL·E Mini - Generate images from a text prompt
pytorch-lightning - Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
latent-diffusion - High-Resolution Image Synthesis with Latent Diffusion Models