Top 18 Python foundation-model Projects

ColossalAI

42 37,836 9.7 Python

Making large AI models cheaper, faster and more accessible

Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22

unilm

40 18,319 9.0 Python

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Project mention: The Era of 1-Bit LLMs: Training_Tips, Code And_FAQ [pdf] | news.ycombinator.com | 2024-03-21

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
LLaVA

20 16,101 9.4 Python

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Project mention: Show HN: I Remade the Fake Google Gemini Demo, Except Using GPT-4 and It's Real | news.ycombinator.com | 2023-12-10

Update: For anyone else facing the commercial use question on LLaVA - it is licensed under Apache 2.0. Can be used commercially with attribution: https://github.com/haotian-liu/LLaVA/blob/main/LICENSE

Otter

4 3,441 9.1 Python

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Project mention: OpenAI vs Google, Detect ChatGPT Content with 99% accuracy, Navigating AI compute costs | /r/ChatGPT | 2023-06-15

👀 Video-LLaMA - Empower large language models with video and audio understanding capability. (link) 🦦 Otter - Multi-modal model with improved instruction-following and in-context learning ability. 🔗 Linkly.AI - AI-powered lead analytics and management platform that helps you track, analyze, and streamline your leads in one place. 🎬 Jet Cut Ready - AI plugin for Adobe Premiere Pro that automatically removes silent parts in videos. (link) 💬 HeyGen's ChatGPT Plugin - Convert text into high-quality videos using AI text and video generation.

NExT-GPT

1 2,860 9.3 Python

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

Project mention: Show HN: NExT-GPT – First LLM working with multimodal input and output | news.ycombinator.com | 2023-09-21

Ask-Anything

3 2,663 8.2 Python

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
EVA

2 1,957 6.2 Python

EVA Series: Visual Representation Fantasies from BAAI (by baaivision)
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
chronos-forecasting

3 1,589 6.8 Python

Chronos: Pretrained (Language) Models for Probabilistic Time Series Forecasting

Project mention: Financial Market Applications of LLMs | news.ycombinator.com | 2024-04-20

There were some developments using LLMs in the timeseries domain which caught my attention.
I toyed with the Chronos forecasting toolkit [1], and the results were predictably off by wild margins [2]
What really caught my eye though was the "feel" of the predicted timeseries -- this is the first time I've seen synthetic timeseries that look like the real thing. Stock charts have a certain quality to them, once you've been looking at them long enough, you can tell more often than not whether some unlabeled data is a stock price timeseries or not. It seems the chronos LLM was able to pick up on that "nature" of the price movement, and replicate it in its forecasts. Impressive!
1: https://github.com/amazon-science/chronos-forecasting
2: https://imgur.com/a/hTRQ38d

autodistill

13 1,529 9.2 Python

Images to inference with no labeling (use foundation models to train supervised models).

Project mention: Ask HN: Who is hiring? (February 2024) | news.ycombinator.com | 2024-02-01

Roboflow | Open Source Software Engineer, Web Designer / Developer, and more. | Full-time (Remote, SF, NYC) | https://roboflow.com/careers?ref=whoishiring0224
Roboflow is the fastest way to use computer vision in production. We help developers give their software the sense of sight. Our end-to-end platform[1] provides tooling for image collection, annotation, dataset exploration and curation, training, and deployment.
Over 250k engineers (including engineers from 2/3 Fortune 100 companies) build with Roboflow. We now host the largest collection of open source computer vision datasets and pre-trained models[2]. We are pushing forward the CV ecosystem with open source projects like Autodistill[3] and Supervision[4]. And we've built one of the most comprehensive resources for software engineers to learn to use computer vision with our popular blog[5] and YouTube channel[6].
We have several openings available but are primarily looking for strong technical generalists who want to help us democratize computer vision and like to wear many hats and have an outsized impact. Our engineering culture is built on a foundation of autonomy & we don't consider an engineer fully ramped until they can "choose their own loss function". At Roboflow, engineers aren't just responsible for building things but also for helping us figure out what we should build next. We're builders & problem solvers; not just coders. (For this reason we also especially love hiring past and future founders.)
We're currently hiring full-stack engineers for our ML and web platform teams, a web developer to bridge our product and marketing teams, several technical roles on the sales & field engineering teams, and our first applied machine learning researcher to help push forward the state of the art in computer vision.
[1]: https://roboflow.com/?ref=whoishiring0224
[2]: https://roboflow.com/universe?ref=whoishiring0224
[3]: https://github.com/autodistill/autodistill
[4]: https://github.com/roboflow/supervision
[5]: https://blog.roboflow.com/?ref=whoishiring0224
[6]: https://www.youtube.com/@Roboflow

Emu

2 1,491 7.4 Python

Emu Series: Generative Multimodal Models from BAAI (by baaivision)

Project mention: Show HN: Emu2 – A Gemini-like open-source 37B Multimodal Model | news.ycombinator.com | 2023-12-21

I'm excited to introduce Emu2, the latest generative multimodal model developed by the Beijing Academy of Artificial Intelligence (BAAI). Emu2 is an open-source initiative that reflects BAAI's commitment to fostering open, secure, and responsible AI research. It's designed to enhance AI's proficiency in handling tasks across various modalities with minimal examples and straightforward instructions.
Emu2 has demonstrated superior performance over other large-scale models like Flamingo-80B in few-shot multimodal understanding tasks. It serves as a versatile base model for developers, providing a flexible platform for crafting specialized multimodal applications.
Key features of Emu2 include:
- A more streamlined modeling framework than its predecessor, Emu.
- A decoder capable of reconstructing images from the encoder's semantic space.
- An expansion to 37 billion parameters, boosting both capabilities and generalization.
BAAI has also released fine-tuned versions, Emu2-Chat for visual understanding and Emu2-Gen for visual generation, which stand as some of the most powerful open-source models available today.
Here are the resources for those interested in exploring or contributing to Emu2:
- Project: https://baaivision.github.io/emu2/
- Model: https://huggingface.co/BAAI/Emu2
- Code: https://github.com/baaivision/Emu/tree/main/Emu2
- Demo: https://huggingface.co/spaces/BAAI/Emu2
- Paper: https://arxiv.org/abs/2312.13286
We're eager to see how the HN community engages with Emu2 and we welcome your feedback to help us improve. Let's collaborate to push the boundaries of multimodal AI!

lag-llama

2 942 7.8 Python

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting

Project mention: Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting | news.ycombinator.com | 2024-02-26

InternVideo

3 909 8.0 Python

Video Foundation Models & Data for Multimodal Understanding
ONE-PEACE

2 838 8.6 Python

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Project mention: A general representation modal across vision, audio, language modalities | news.ycombinator.com | 2023-05-25

meerkat

2 811 7.3 Python

Creative interactive views of any dataset.
MindVideo

7 348 4.9 Python

Official code base for MinD-Video

Project mention: This research project on reconstructing video stimulus to the brain using an MRI scanner and AI algorithms reminds me of the RDA brain reading technology | /r/Avatar | 2023-06-23

fondant

4 319 9.7 Python

Production-ready data processing made easy and shareable

Project mention: 25 million Creative Commons image dataset released! | /r/StableDiffusion | 2023-10-01

Github: https://github.com/ml6team/fondant

GRID-playground

1 243 6.8 Python

Platform for General Robot Intelligence Development

Project mention: GRID: General Robot Intelligence Development Platform | news.ycombinator.com | 2023-10-17

meta-prompting

1 29 8.9 Python

Official implementation of BGPT @ ICLR 2024 paper "Meta Prompting for AI Systems" (https://arxiv.org/abs/2311.11482)

Project mention: Meta Prompting for AGI Systems | news.ycombinator.com | 2024-02-29

SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python foundation-models related posts

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting
1 project | news.ycombinator.com | 26 Feb 2024
Show HN: Emu2 – A Gemini-like open-source 37B Multimodal Model
1 project | news.ycombinator.com | 21 Dec 2023
25 million Creative Commons image dataset released!
1 project | /r/StableDiffusion | 1 Oct 2023
Show HN: Autodistill, automated image labeling with foundation vision models
1 project | news.ycombinator.com | 6 Sep 2023
[P] AI image generation without copyright infringement
1 project | /r/MachineLearning | 29 Jun 2023
This research project on reconstructing video stimulus to the brain using an MRI scanner and AI algorithms reminds me of the RDA brain reading technology
1 project | /r/Avatar | 23 Jun 2023
Autodistill: Use foundation vision models to train smaller, supervised models
1 project | news.ycombinator.com | 22 Jun 2023
A note from our sponsor - WorkOS
workos.com | 26 Apr 2024

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →

Index

What are some of the best open-source foundation-model projects in Python? This list will help you:

	Project	Stars
1	ColossalAI	37,836
2	unilm	18,319
3	LLaVA	16,101
4	Otter	3,441
5	NExT-GPT	2,860
6	Ask-Anything	2,663
7	EVA	1,957
8	chronos-forecasting	1,589
9	autodistill	1,529
10	Emu	1,491
11	lag-llama	942
12	InternVideo	909
13	ONE-PEACE	838
14	meerkat	811
15	MindVideo	348
16	fondant	319
17	GRID-playground	243
18	meta-prompting	29