Python multimodality

Open-source Python projects categorized as multimodality

Top 7 Python multimodality Projects

  • big-sleep

    A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

  • multimodal-maestro

    Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥

  • Project mention: Show HN: Multimodal Maestro – Prompt tools for use with LMMs | news.ycombinator.com | 2023-11-29
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • FEDOT

    Automated modeling and machine learning framework FEDOT

  • Woodpecker

    ✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs. (by BradyFU)

  • Project mention: shinning the spotlight on CogVLM | /r/LocalLLaMA | 2023-12-09

    Woodpecker: Hallucination Correction for Multimodal Large Language Models https://github.com/BradyFU/Woodpecker

  • GPT4RoI

    GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

  • Project mention: GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest | /r/LocalLLaMA | 2023-07-09

    Instruction tuning large language model (LLM) on image-text pairs has achieved unprecedented vision-language multimodal abilities. However, their vision-language alignments are only built on image-level, the lack of region-level alignment limits their advancements to fine-grained multimodal understanding. In this paper, we propose instruction tuning on region-of-interest. The key design is to reformulate the bounding box as the format of spatial instruction. The interleaved sequences of visual features extracted by the spatial instruction and the language embedding are input to LLM, and trained on the transformed region-text data in instruction tuning format. Our region-level vision-language model, termed as GPT4RoI, brings brand new conversational and interactive experience beyond image-level understanding. (1) Controllability: Users can interact with our model by both language and spatial instructions to flexibly adjust the detail level of the question. (2) Capacities: Our model supports not only single-region spatial instruction but also multi-region. This unlocks more region-level multimodal capacities such as detailed region caption and complex region reasoning. (3) Composition: Any off-the-shelf object detector can be a spatial instruction provider so as to mine informative object attributes from our model, like color, shape, material, action, relation to other objects, etc. The code, dataset, and demo can be found at https://github.com/jshilong/GPT4RoI.

  • clip-guided-diffusion

    A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.

  • dance

    DANCE: a deep learning library and benchmark platform for single-cell analysis (by OmicsML)

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python multimodality related posts

Index

What are some of the best open-source multimodality projects in Python? This list will help you:

Project Stars
1 big-sleep 2,548
2 multimodal-maestro 942
3 FEDOT 605
4 Woodpecker 534
5 GPT4RoI 450
6 clip-guided-diffusion 440
7 dance 323

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com