Top 13 multimodal-deep-learning Open-Source Projects

LAVIS

18 8,738 6.3 Jupyter Notebook

LAVIS - A One-stop Library for Language-Vision Intelligence

Project mention: FLaNK AI for 11 March 2024 | dev.to | 2024-03-11

BentoML

16 6,558 9.8 Python

The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!

Project mention: Who's hiring developer advocates? (December 2023) | dev.to | 2023-12-04

Link to GitHub -->

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Awesome-Text-to-Image

1 1,878 9.1

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
pytorch-widedeep

7 1,238 8.5 Python

A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
Time-LLM

1 742 7.3 Python

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"

Project mention: karpathy/llm.c | news.ycombinator.com | 2024-04-08

Yes general LLM models can be used for time series forecasting:
https://github.com/KimMeen/Time-LLM

blended-latent-diffusion

1 509 4.5 Jupyter Notebook

Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]
scarches

1 310 6.9 Jupyter Notebook

Reference mapping for single-cell genomics
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
CLoT

1 219 7.5 Python

Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation" (CVPR 2024) (by sail-sg)

Project mention: CVPR 2024 Survival Guide: Five Vision-Language Papers You Don’t Want to Miss | dev.to | 2024-04-15

GitHub

DeepViewAgg

4 215 4.8 Python

[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"
CapDec

3 169 5.6 Python

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

Project mention: Open source – Unsupervised captioning getting closer to supervised captioning | news.ycombinator.com | 2024-04-20

VQASynth

1 74 8.6 Python

Compose multimodal datasets 🎹

Project mention: Show HN: VQASynth – pipelines to synthesize VQA datasets | news.ycombinator.com | 2024-02-23

3DCoMPaT-v2

2 69 5.8 Python

3DCoMPaT++: An improved large-scale 3D vision dataset for compositional recognition

Project mention: [D] 3DCoMPaT Challenge: Tag materials and parts on 3D Models. 3K$ USD price pool | /r/MachineLearning | 2023-05-10

Multimodal

1 8 0.0 Jupyter Notebook

Listen. Write. Speak. Read. Think. (by kritiksoman)

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

multimodal-deep-learning related posts

Open source – Unsupervised captioning getting closer to supervised captioning

1 project | news.ycombinator.com | 20 Apr 2024
[D] 3DCoMPaT Challenge: Tag materials and parts on 3D Models. 3K$ USD price pool

1 project | /r/MachineLearning | 10 May 2023
Reverse engineer Stable Diffusion images

2 projects | news.ycombinator.com | 8 Feb 2023
[R] [CVPR 2022 Oral] Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation

2 projects | /r/MachineLearning | 11 May 2022

Index

What are some of the best open-source multimodal-deep-learning projects? This list will help you:

	Project	Stars
1	LAVIS	8,738
2	BentoML	6,558
3	Awesome-Text-to-Image	1,878
4	pytorch-widedeep	1,238
5	Time-LLM	742
6	blended-latent-diffusion	509
7	scarches	310
8	CLoT	219
9	DeepViewAgg	215
10	CapDec	169
11	VQASynth	74
12	3DCoMPaT-v2	69
13	Multimodal	8