Python multi-modal

Open-source Python projects categorized as multi-modal

Top 18 Python multi-modal Projects

multi-modal
  • modelscope

    ModelScope: bring the notion of Model-as-a-Service to life.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • CogVLM

    a state-of-the-art-level open visual language model | 多模态预训练模型

    Project mention: Mixtral: Mixture of Experts | news.ycombinator.com | 2024-01-08

    CogVLM is very good in my (brief) testing: https://github.com/THUDM/CogVLM

    The model weights seem to be under a non-commercial license, not true open source, but it is "open access" as you requested.

  • DALLE-pytorch

    Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

  • marqo

    Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

    Project mention: AI Search That Understands the Way Your Customer's Think | news.ycombinator.com | 2024-05-28
  • Chinese-CLIP

    Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

  • DeepKE

    [EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

  • Video-LLaVA

    【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

  • docarray

    Represent, send, store and search multimodal data

  • CogVLM2

    GPT4V-level open-source multi-modal model based on Llama3-8B

    Project mention: AIM Weekly 27 May 2024 | dev.to | 2024-05-28
  • LISA

    Project Page for "LISA: Reasoning Segmentation via Large Language Model"

    Project mention: SamGIS - Alcuni appunti su Segment Anything | dev.to | 2024-05-27

    LISA

  • GPTDiscord

    A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!

  • MotionGPT

    [NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs

  • SALMONN

    SALMONN: Speech Audio Language Music Open Neural Network

  • transfusion-pytorch

    Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

    Project mention: Transfusion: Predict the Next Token and Diffuse Images with One Multimodal Model | news.ycombinator.com | 2024-09-10

    Doesn't appear to be any weights uploaded anywhere that I can find.

    There are the starts of two (non-original-author) public implementations available on Github, but again -- doesn't appear to be any pretrained weights in either.

    * https://github.com/lucidrains/transfusion-pytorch

    * https://github.com/VachanVY/Transfusion.torch

  • UniControl

    Unified Controllable Visual Generation Model

  • zeta

    Build high-performance AI models with modular building blocks (by kyegomez)

    Project mention: Zetascale, Build high-performance AI models with modular building blocks | news.ycombinator.com | 2024-02-09
  • VLDet

    [ICLR 2023] PyTorch implementation of VLDet (https://arxiv.org/abs/2211.14843)

  • vlm-api

    REST API for computing cross-modal similarity between images and text using the ColPaLI vision-language model

    Project mention: Show HN: Documind – Open-source AI tool to turn documents into structured data | news.ycombinator.com | 2024-11-18

    VLMs are cool - they generate embeddings of the images themselves (as a collection of patches) and you can see query matching displayed as a heatmap over the document. Picks up text that OCR misses. Here's an open-source API demo I built if you want to try it out: https://github.com/DataFog/vlm-api

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python multi-modal discussion

Log in or Post with

Python multi-modal related posts

  • Show HN: I built an open source AI video search engine to learn more about AI

    2 projects | news.ycombinator.com | 19 Dec 2023
  • CogAgent-18B – visual-based GUI Agent capabilities

    2 projects | news.ycombinator.com | 16 Dec 2023
  • What do you think. When should we expect the next SDXL version?

    1 project | /r/StableDiffusion | 10 Dec 2023
  • shinning the spotlight on CogVLM

    3 projects | /r/LocalLLaMA | 9 Dec 2023
  • Gemini: Google's most capable AI model yet

    2 projects | news.ycombinator.com | 6 Dec 2023
  • Open-source LLMs with Image Interpretation

    1 project | /r/LocalLLaMA | 6 Dec 2023
  • FLaNK Stack Weekly for 27 November 2023

    28 projects | dev.to | 27 Nov 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 3 Dec 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source multi-modal projects in Python? This list will help you:

Project Stars
1 modelscope 7,057
2 CogVLM 6,147
3 DALLE-pytorch 5,578
4 marqo 4,662
5 Chinese-CLIP 4,611
6 DeepKE 3,591
7 Video-LLaVA 3,024
8 docarray 2,988
9 CogVLM2 2,139
10 LISA 1,895
11 GPTDiscord 1,826
12 MotionGPT 1,514
13 SALMONN 1,065
14 transfusion-pytorch 752
15 UniControl 624
16 zeta 430
17 VLDet 185
18 vlm-api 3

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you konow that Python is
the 1st most popular programming language
based on number of metions?