torchscale
Multimodal-GPT
torchscale | Multimodal-GPT | |
---|---|---|
2 | 4 | |
2,927 | 1,407 | |
1.6% | 1.8% | |
7.2 | 5.4 | |
25 days ago | 11 months ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
torchscale
-
Retentive Network: A Successor to Transformer Implemented in PyTorch
A retnet commit has now appeared in Microsoft's torchscale repo:
https://github.com/microsoft/torchscale/commit/bf65397b26469...
- [R] TorchScale: Transformers at Scale - Microsoft 2022 Shuming Ma et al - Improves modeling generality and capability, as well as training stability and efficiency.
Multimodal-GPT
- Meet MultiModal-GPT: A Vision and Language Model for Multi-Round Dialogue with Humans
-
Breaking: OpenAI plans to release an own open-source chatbot AI as it comes under competitive pressure. My analysis on what this means for ChatGPT and LLMs.
A number of them have popped up as training methods to introduce multimodality have proliferated. Here's one: https://mmgpt.openmmlab.org.cn/
- MultiModal-GPT: A Vision and Language Model for Dialogue with Humans
- Train a multi-modal chatbot with visual and language instructions
What are some alternatives?
towhee - Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
LLaVA - [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
fairscale - PyTorch extensions for high performance and large scale training.
ONE-PEACE - A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
bertviz - BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)
mPLUG-Owl - mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model
extreme-bert - ExtremeBERT is a toolkit that accelerates the pretraining of customized language models on customized datasets, described in the paper “ExtremeBERT: A Toolkit for Accelerating Pretraining of Customized BERT”.
InternGPT - InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
xformers - Hackable and optimized Transformers building blocks, supporting a composable construction.
glami-1m - The largest multilingual image-text classification dataset. It contains fashion products.
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
RetNet - An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"