vision-and-language

Open-source projects categorized as vision-and-language

Top 16 vision-and-language Open-Source Projects

  • LAVIS

    LAVIS - A One-stop Library for Language-Vision Intelligence

  • Project mention: FLaNK AI for 11 March 2024 | dev.to | 2024-03-11
  • Multimodal-GPT

    Multimodal-GPT

  • Project mention: Meet MultiModal-GPT: A Vision and Language Model for Multi-Round Dialogue with Humans | /r/machinelearningnews | 2023-05-19
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • prismer

    The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

  • Oscar

    Oscar and VinVL

  • ONE-PEACE

    A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

  • Project mention: A general representation modal across vision, audio, language modalities | news.ycombinator.com | 2023-05-25
  • AlphaCLIP

    [CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

  • Project mention: CVPR 2024 Survival Guide: Five Vision-Language Papers You Don’t Want to Miss | dev.to | 2024-04-15

    GitHub

  • pytorch_mgie

    A Gradio demo of MGIE

  • Project mention: Guiding Instruction-Based Image Editing via Multimodal Large Language Models | news.ycombinator.com | 2024-02-13
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • conceptual-12m

    Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.

  • CLIP-Caption-Reward

    PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)

  • VL_adapter

    PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)

  • ALPRO

    Align and Prompt: Video-and-Language Pre-training with Entity Prompts

  • VLDet

    [ICLR 2023] PyTorch implementation of VLDet (https://arxiv.org/abs/2211.14843)

  • DallEval

    DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models (ICCV 2023)

  • multimodal

    A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal" (by cdancette)

  • robo-vln

    Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

  • zeroshot-storytelling

    Github repository for Zero Shot Visual Storytelling

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

vision-and-language related posts

  • [D] Why is most Open Source AI happening outside the USA?

    2 projects | /r/MachineLearning | 6 Dec 2023
  • Need help for a colab notebook running Lavis blip2_instruct_vicuna13b?

    1 project | /r/GoogleColab | 25 Jun 2023
  • most sane web3 job listing

    2 projects | /r/ProgrammerHumor | 29 May 2023
  • I work at a non-tech company and have been asked to make software that is impossible. How do I explain it to my boss?

    2 projects | /r/cscareerquestions | 26 May 2023
  • Two-minute Daily AI Update (Date: 5/15/2023)

    1 project | /r/ChatGPT | 15 May 2023
  • InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

    1 project | /r/MachineLearning | 14 May 2023
  • Is there a process that is the opposite of image generation?

    1 project | /r/StableDiffusion | 31 Jan 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 1 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source vision-and-language projects? This list will help you:

Project Stars
1 LAVIS 8,738
2 Multimodal-GPT 1,407
3 prismer 1,288
4 Oscar 1,027
5 ONE-PEACE 847
6 AlphaCLIP 498
7 pytorch_mgie 320
8 conceptual-12m 305
9 CLIP-Caption-Reward 225
10 VL_adapter 193
11 ALPRO 180
12 VLDet 169
13 DallEval 133
14 multimodal 70
15 robo-vln 61
16 zeroshot-storytelling 15

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com