mmf
CapDec
mmf | CapDec | |
---|---|---|
2 | 3 | |
5,417 | 169 | |
0.1% | - | |
5.5 | 5.6 | |
2 months ago | 3 months ago | |
Python | Python | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mmf
-
Context in first comment
mmf, which is a multimodal pytorch framework by facebook research, was released around 2-3 years ago and is now poorly maintained.
-
[N] TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.
How is this different from mmf? https://github.com/facebookresearch/mmf
CapDec
- Open source – Unsupervised captioning getting closer to supervised captioning
-
Reverse engineer Stable Diffusion images
Cool! I also how a project that does image captioning: https://github.com/DavidHuji/CapDec
- CapDec: SOTA Zero Shot Image Captioning Using Clip and GPT2
What are some alternatives?
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
pytorch-widedeep - A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
smgeo - Geolocation Inference for Reddit
3DCoMPaT-v2 - 3DCoMPaT++: An improved large-scale 3D vision dataset for compositional recognition
asteroid - The PyTorch-based audio source separation toolkit for researchers
DeepViewAgg - [CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"
mayavoz - Pytorch based speech enhancement toolkit.
MAGIC - Language Models Can See: Plugging Visual Controls in Text Generation
multimodal - TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.
x-clip - A concise but complete implementation of CLIP with various experimental improvements from recent papers
img2dataset - Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
LAVIS - LAVIS - A One-stop Library for Language-Vision Intelligence