SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python multimodal Projects
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
-
courses
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI) (by SkalskiP)
-
tree-of-thoughts
Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
-
InternGPT
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
-
OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
-
autodistill
Images to inference with no labeling (use foundation models to train supervised models).
-
CoCa-pytorch
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
-
uform
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
-
ONE-PEACE
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
-
swarms
Orchestrate Swarms of Agents From Any Framework Like OpenAI, Langchain, and Etc for Real World Workflow Automation. Join our Community: https://discord.gg/DbjBMJTSWD
-
agentchain
Chain together LLMs for reasoning & orchestrate multiple large models for accomplishing complex tasks
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: The Era of 1-Bit LLMs: Training_Tips, Code And_FAQ [pdf] | news.ycombinator.com | 2024-03-21
Project mention: [P] Making a TTS voice, HK-47 from Kotor using Tortoise (Ideally WaveRNN) | /r/MachineLearning | 2023-07-06I don't test WaveRNN but from the ones that I know the best that is open source is FastPitch. And it's easy to use, here is the tutorial for voice cloning.
Project mention: If you are looking for free courses about AI, LLMs, CV, or NLP, I created the repository with links to resources that I found super high quality and helpful. The link is in the comment. | /r/ChatGPT | 2023-07-02I found it: https://github.com/SkalskiP/courses
Project mention: [D] Potential scammer on github stealing work of other ML researchers? | /r/MachineLearning | 2023-08-17I checked the issues and found https://github.com/kyegomez/tree-of-thoughts/issues/78
Project mention: OpenAI sued for web scraping from millions of internet users in order to train ChatGPT | /r/ArtistHate | 2023-06-30Lmao, no it doesn't. As we can see, their downloader uses very obscure "no ai" headers (which can be disabled, so its useless). They only claim it respects "robots.txt" because the google crawler respects it, if a site changes their robots.txt rules they don't remove it from their dataset, that is not "respecting". https://github.com/rom1504/img2dataset
You can also create an issue and ask the developers for help.
Project mention: Retentive Network: A Successor to Transformer Implemented in PyTorch | news.ycombinator.com | 2023-07-24A retnet commit has now appeared in Microsoft's torchscale repo:
https://github.com/microsoft/torchscale/commit/bf65397b26469...
Project mention: Show HN: NExT-GPT – First LLM working with multimodal input and output | news.ycombinator.com | 2023-09-21
Project mention: DocArray – Represent, send, and store multimodal data for ML | news.ycombinator.com | 2023-04-27
Project mention: Unleash the Power of Video-LLaMA: Revolutionizing Language Models with Video and Audio Understanding! | dev.to | 2023-06-12We extend our deepest gratitude to the extraordinary projects that have influenced and contributed to the development of Video-LLaMA. We're indebted to MiniGPT-4, FastChat, BLIP-2, EVA-CLIP, ImageBind, LLaMA, VideoChat, LLaVA, WebVid, and mPLUG-Owl for their invaluable contributions. Special thanks to Midjourney for creating the stunning Video-LLaMA logo, encapsulating the essence of our groundbreaking project.
Roboflow | Open Source Software Engineer, Web Designer / Developer, and more. | Full-time (Remote, SF, NYC) | https://roboflow.com/careers?ref=whoishiring0224
Roboflow is the fastest way to use computer vision in production. We help developers give their software the sense of sight. Our end-to-end platform[1] provides tooling for image collection, annotation, dataset exploration and curation, training, and deployment.
Over 250k engineers (including engineers from 2/3 Fortune 100 companies) build with Roboflow. We now host the largest collection of open source computer vision datasets and pre-trained models[2]. We are pushing forward the CV ecosystem with open source projects like Autodistill[3] and Supervision[4]. And we've built one of the most comprehensive resources for software engineers to learn to use computer vision with our popular blog[5] and YouTube channel[6].
We have several openings available but are primarily looking for strong technical generalists who want to help us democratize computer vision and like to wear many hats and have an outsized impact. Our engineering culture is built on a foundation of autonomy & we don't consider an engineer fully ramped until they can "choose their own loss function". At Roboflow, engineers aren't just responsible for building things but also for helping us figure out what we should build next. We're builders & problem solvers; not just coders. (For this reason we also especially love hiring past and future founders.)
We're currently hiring full-stack engineers for our ML and web platform teams, a web developer to bridge our product and marketing teams, several technical roles on the sales & field engineering teams, and our first applied machine learning researcher to help push forward the state of the art in computer vision.
[1]: https://roboflow.com/?ref=whoishiring0224
[2]: https://roboflow.com/universe?ref=whoishiring0224
[3]: https://github.com/autodistill/autodistill
[4]: https://github.com/roboflow/supervision
[5]: https://blog.roboflow.com/?ref=whoishiring0224
[6]: https://www.youtube.com/@Roboflow
Project mention: Meet MultiModal-GPT: A Vision and Language Model for Multi-Round Dialogue with Humans | /r/machinelearningnews | 2023-05-19
Project mention: CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-Scale Data | news.ycombinator.com | 2024-04-25question: any good on-device size image embedding models?
tried https://github.com/unum-cloud/uform which i do like, especially they also support languages other than English. Any recommendations on other alternatives?
Project mention: A general representation modal across vision, audio, language modalities | news.ycombinator.com | 2023-05-25
Project mention: Swarms – Automating all digital activities with millions of autonomous AI Agents | news.ycombinator.com | 2023-07-10
Python multimodal related posts
- CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-Scale Data
- Multimodal Embeddings for JavaScript, Swift, and Python
- VT.ai – Multi-Modal LLM Chat Application
- Ask HN: What are you building with AI?
- Show HN: I just open sourced my document/website extractor for Vision-LLMs
- Show HN: UForm v2 Featuring Multimodal Matryoshka, Multimodal DPO, and ONNX
- UForm v1: Multimodal Chat in 1.5B Parameters
-
A note from our sponsor - SaaSHub
www.saashub.com | 26 Apr 2024
Index
What are some of the best open-source multimodal projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | jina | 20,009 |
2 | unilm | 18,319 |
3 | NeMo | 10,021 |
4 | mmf | 5,415 |
5 | courses | 4,486 |
6 | tree-of-thoughts | 4,029 |
7 | discoart | 3,839 |
8 | img2dataset | 3,242 |
9 | mmpretrain | 3,156 |
10 | InternGPT | 3,121 |
11 | torchscale | 2,915 |
12 | NExT-GPT | 2,860 |
13 | docarray | 2,739 |
14 | OFA | 2,323 |
15 | mPLUG-Owl | 1,917 |
16 | autodistill | 1,529 |
17 | Multimodal-GPT | 1,401 |
18 | CoCa-pytorch | 973 |
19 | InternVideo | 909 |
20 | uform | 865 |
21 | ONE-PEACE | 838 |
22 | swarms | 650 |
23 | agentchain | 563 |
Sponsored