Machine-Learning-Collection
a-PyTorch-Tutorial-to-Image-Captioning
Machine-Learning-Collection | a-PyTorch-Tutorial-to-Image-Captioning | |
---|---|---|
9 | 1 | |
6,991 | 2,657 | |
- | - | |
3.5 | 0.0 | |
3 months ago | almost 2 years ago | |
Python | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Machine-Learning-Collection
-
Building an AI Game Bot 🤖Using Imitation Learning and 3D Convolution ResNet
def compute_mean_std(dataloader): ''' We assume that the images of the dataloader have the same height and width source: https://github.com/aladdinpersson/Machine-Learning-Collection/blob/master/ML/Pytorch/Basics/pytorch_std_mean.py ''' # var[X] = E[X**2] - E[X]**2 channels_sum, channels_sqrd_sum, num_batches = 0, 0, 0 for batch_images, labels in tqdm(dataloader): # (B,H,W,C) batch_images = batch_images.permute(0,3,4,2,1) channels_sum += torch.mean(batch_images, dim=[0, 1, 2, 3]) channels_sqrd_sum += torch.mean(batch_images ** 2, dim=[0, 1, 2,3]) num_batches += 1 mean = channels_sum / num_batches std = (channels_sqrd_sum / num_batches - mean ** 2) ** 0.5 return mean, std compute_mean_std(dataloader)
-
What can be the reasons of BatchNorm working and Dropout not working in YoloV1 Pytorch implementation?
I then found Aladdin Persson implementation (which he described in YouTube video). He said that original paper used Dropout, because BatchNorm was not invented at the time, and he wants to use BatchNorm instead. I thought there is no critical difference between these two, and decided to stick up with paper for the sake of learning to implement such things.
-
How to create a custom parallel corpus for machine translation with recent versions of pytorch and torchtext?
I am trying to train a model for NMT on a custom dataset. I found this great tutorial on youtube along with the accompanying repo, but it uses an old version of PyTorch and torchtext. More recent versions of torchtext have removed the Field and BucketIterator classes. I looked for more recent tutorials. The closest thing I could find was this medium post (again with the accompanying code) which worked with a custom dataset for text classification. I tried to replicate the code with my problem and got this far:
-
I need help knowing how to improve a CycleGan I am working on.
with the source code here: Machine-Learning-Collection/ML/Pytorch/GANs/CycleGAN at master · aladdinpersson/Machine-Learning-Collection · GitHub
-
Pytorch: Custom Dataset for Machine Translation
seq2seq_attention
-
Project to prettify music notes
I followed a tutorial to do a pix2pix GAN network here: https://www.youtube.com/watch?v=SuddDSqGRzg, and the github.
-
Awesome Youtube
Aladdin Persson
-
Help with initial set-up.
Hello everyone. I am all set up and ready to go. I downloaded the code below (btw, I don't fully understand what it does, lol) and ran it a few times.
-
YOLOv3 from scratch in PyTorch
Code: https://github.com/aladdinpersson/Machine-Learning-Collection/tree/master/ML/Pytorch/object_detection/YOLOv3
a-PyTorch-Tutorial-to-Image-Captioning
-
[R] end-to-end image captioning
I have found this repository: https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning that, seemingly, requires only images and captions, but this is quite old (3 years ago), and is based on LSTMs. I was hoping there are transformers-based implementations that I could use.
What are some alternatives?
6DRepNet - Official Pytorch implementation of 6DRepNet: 6D Rotation representation for unconstrained head pose estimation.
meshed-memory-transformer - Meshed-Memory Transformer for Image Captioning. CVPR 2020
nodding-pigeon - Detection and classification of head gestures in videos
BLIP - PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
ALAE - [CVPR2020] Adversarial Latent Autoencoders
image-to-latex - Convert images of LaTex math equations into LaTex code.
Data-Efficient-Reinforcement-Learning-with-Probabilistic-Model-Predictive-Control - Unofficial Implementation of the paper "Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control", applied to gym environments
pytorch-tutorial - PyTorch Tutorial for Deep Learning Researchers
Gradient-Centralization-TensorFlow - Instantly improve your training performance of TensorFlow models with just 2 lines of code!
catr - Image Captioning Using Transformer
pytorch-accelerated - A lightweight library designed to accelerate the process of training PyTorch models by providing a minimal, but extensible training loop which is flexible enough to handle the majority of use cases, and capable of utilizing different hardware options with no code changes required. Docs: https://pytorch-accelerated.readthedocs.io/en/latest/
clip-glass - Repository for "Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search"