Top 7 Python multimodal-learning Projects

open_flamingo

4 3,459 7.8 Python

An open-source framework for training large multimodal models.

Project mention: Are there any multimodal AI models I can use to provide a paired text *and* image input, to then generate an expanded descriptive text output? [D] | /r/MachineLearning | 2023-07-05

Maybe the recent OpenFlamingo gives you better results (they have a demo on HF).

Multimodal-Toolkit

2 553 7.6 Python

Multimodal model for text and tabular data with HuggingFace transformers as building block for text data
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
XPretrain

1 436 1.0 Python

Multi-modality pre-training

Project mention: CVPR 2024 Datasets and Benchmarks - Part 1: Datasets | dev.to | 2024-04-23

It was created by curating 3.8 million high-resolution videos from the publicly available HD-VILA-100M dataset. The dataset is created following these three steps:

pykale

2 427 9.1 Python

Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem. ⭐ Star to support our work!
UPop

1 83 8.4 Python

[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.

Project mention: Show HN: Compress vision-language and unimodal AI models by structured pruning | news.ycombinator.com | 2023-07-31

valhalla-nmt

1 26 0.9 Python

Code repository for CVPR 2022 paper "VALHALLA: Visual Hallucination for Machine Translation"
Coin-CLIP

2 9 7.9 Python

Coin-CLIP: fine-tuned with a vast collection of coin images from CLIP using contrastive learning. It enhances feature extraction for coins, boosting image search accuracy. This model merges Visual Transformer (ViT) with CLIP's multimodal learning, optimized for numismatic applications.

Project mention: New Multimodal Model Coin-CLIP for Coin Identification/Recognition | /r/Multimodal | 2023-12-08

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).