The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 7 Python multimodality Projects
-
big-sleep
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun
-
multimodal-maestro
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Woodpecker
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs. (by BradyFU)
-
clip-guided-diffusion
A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Project mention: Show HN: Multimodal Maestro – Prompt tools for use with LMMs | news.ycombinator.com | 2023-11-29
Woodpecker: Hallucination Correction for Multimodal Large Language Models https://github.com/BradyFU/Woodpecker
Project mention: GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest | /r/LocalLLaMA | 2023-07-09Instruction tuning large language model (LLM) on image-text pairs has achieved unprecedented vision-language multimodal abilities. However, their vision-language alignments are only built on image-level, the lack of region-level alignment limits their advancements to fine-grained multimodal understanding. In this paper, we propose instruction tuning on region-of-interest. The key design is to reformulate the bounding box as the format of spatial instruction. The interleaved sequences of visual features extracted by the spatial instruction and the language embedding are input to LLM, and trained on the transformed region-text data in instruction tuning format. Our region-level vision-language model, termed as GPT4RoI, brings brand new conversational and interactive experience beyond image-level understanding. (1) Controllability: Users can interact with our model by both language and spatial instructions to flexibly adjust the detail level of the question. (2) Capacities: Our model supports not only single-region spatial instruction but also multi-region. This unlocks more region-level multimodal capacities such as detailed region caption and complex region reasoning. (3) Composition: Any off-the-shelf object detector can be a spatial instruction provider so as to mine informative object attributes from our model, like color, shape, material, action, relation to other objects, etc. The code, dataset, and demo can be found at https://github.com/jshilong/GPT4RoI.
Python multimodality related posts
- Is creating a StableDiffusion-inspired model feasible for my Master's thesis?
- TEDx talk on how to prepare for a career in vfx with the rapid changes caused by AI / machine learning
- Any good ai art websites that work with pokemon?
- Explore generative art with me
- What do you guys think of LaMDA?
- GitHub - lucidrains/big-sleep: A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun
- Jag gav ett AI program ordet "Sweden"
-
A note from our sponsor - WorkOS
workos.com | 28 Apr 2024
Index
What are some of the best open-source multimodality projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | big-sleep | 2,548 |
2 | multimodal-maestro | 942 |
3 | FEDOT | 605 |
4 | Woodpecker | 534 |
5 | GPT4RoI | 450 |
6 | clip-guided-diffusion | 440 |
7 | dance | 323 |
Sponsored