The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 18 Python foundation-model Projects
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
-
Otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
-
Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
autodistill
Images to inference with no labeling (use foundation models to train supervised models).
-
ONE-PEACE
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
-
meta-prompting
Official implementation of BGPT @ ICLR 2024 paper "Meta Prompting for AI Systems" (https://arxiv.org/abs/2311.11482)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: The Era of 1-Bit LLMs: Training_Tips, Code And_FAQ [pdf] | news.ycombinator.com | 2024-03-21
Project mention: Show HN: I Remade the Fake Google Gemini Demo, Except Using GPT-4 and It's Real | news.ycombinator.com | 2023-12-10Update: For anyone else facing the commercial use question on LLaVA - it is licensed under Apache 2.0. Can be used commercially with attribution: https://github.com/haotian-liu/LLaVA/blob/main/LICENSE
Project mention: OpenAI vs Google, Detect ChatGPT Content with 99% accuracy, Navigating AI compute costs | /r/ChatGPT | 2023-06-15👀 Video-LLaMA - Empower large language models with video and audio understanding capability. (link) 🦦 Otter - Multi-modal model with improved instruction-following and in-context learning ability. 🔗 Linkly.AI - AI-powered lead analytics and management platform that helps you track, analyze, and streamline your leads in one place. 🎬 Jet Cut Ready - AI plugin for Adobe Premiere Pro that automatically removes silent parts in videos. (link) 💬 HeyGen's ChatGPT Plugin - Convert text into high-quality videos using AI text and video generation.
Project mention: Show HN: NExT-GPT – First LLM working with multimodal input and output | news.ycombinator.com | 2023-09-21
There were some developments using LLMs in the timeseries domain which caught my attention.
I toyed with the Chronos forecasting toolkit [1], and the results were predictably off by wild margins [2]
What really caught my eye though was the "feel" of the predicted timeseries -- this is the first time I've seen synthetic timeseries that look like the real thing. Stock charts have a certain quality to them, once you've been looking at them long enough, you can tell more often than not whether some unlabeled data is a stock price timeseries or not. It seems the chronos LLM was able to pick up on that "nature" of the price movement, and replicate it in its forecasts. Impressive!
1: https://github.com/amazon-science/chronos-forecasting
2: https://imgur.com/a/hTRQ38d
Roboflow | Open Source Software Engineer, Web Designer / Developer, and more. | Full-time (Remote, SF, NYC) | https://roboflow.com/careers?ref=whoishiring0224
Roboflow is the fastest way to use computer vision in production. We help developers give their software the sense of sight. Our end-to-end platform[1] provides tooling for image collection, annotation, dataset exploration and curation, training, and deployment.
Over 250k engineers (including engineers from 2/3 Fortune 100 companies) build with Roboflow. We now host the largest collection of open source computer vision datasets and pre-trained models[2]. We are pushing forward the CV ecosystem with open source projects like Autodistill[3] and Supervision[4]. And we've built one of the most comprehensive resources for software engineers to learn to use computer vision with our popular blog[5] and YouTube channel[6].
We have several openings available but are primarily looking for strong technical generalists who want to help us democratize computer vision and like to wear many hats and have an outsized impact. Our engineering culture is built on a foundation of autonomy & we don't consider an engineer fully ramped until they can "choose their own loss function". At Roboflow, engineers aren't just responsible for building things but also for helping us figure out what we should build next. We're builders & problem solvers; not just coders. (For this reason we also especially love hiring past and future founders.)
We're currently hiring full-stack engineers for our ML and web platform teams, a web developer to bridge our product and marketing teams, several technical roles on the sales & field engineering teams, and our first applied machine learning researcher to help push forward the state of the art in computer vision.
[1]: https://roboflow.com/?ref=whoishiring0224
[2]: https://roboflow.com/universe?ref=whoishiring0224
[3]: https://github.com/autodistill/autodistill
[4]: https://github.com/roboflow/supervision
[5]: https://blog.roboflow.com/?ref=whoishiring0224
[6]: https://www.youtube.com/@Roboflow
Project mention: Show HN: Emu2 – A Gemini-like open-source 37B Multimodal Model | news.ycombinator.com | 2023-12-21I'm excited to introduce Emu2, the latest generative multimodal model developed by the Beijing Academy of Artificial Intelligence (BAAI). Emu2 is an open-source initiative that reflects BAAI's commitment to fostering open, secure, and responsible AI research. It's designed to enhance AI's proficiency in handling tasks across various modalities with minimal examples and straightforward instructions.
Emu2 has demonstrated superior performance over other large-scale models like Flamingo-80B in few-shot multimodal understanding tasks. It serves as a versatile base model for developers, providing a flexible platform for crafting specialized multimodal applications.
Key features of Emu2 include:
- A more streamlined modeling framework than its predecessor, Emu.
- A decoder capable of reconstructing images from the encoder's semantic space.
- An expansion to 37 billion parameters, boosting both capabilities and generalization.
BAAI has also released fine-tuned versions, Emu2-Chat for visual understanding and Emu2-Gen for visual generation, which stand as some of the most powerful open-source models available today.
Here are the resources for those interested in exploring or contributing to Emu2:
- Project: https://baaivision.github.io/emu2/
- Model: https://huggingface.co/BAAI/Emu2
- Code: https://github.com/baaivision/Emu/tree/main/Emu2
- Demo: https://huggingface.co/spaces/BAAI/Emu2
- Paper: https://arxiv.org/abs/2312.13286
We're eager to see how the HN community engages with Emu2 and we welcome your feedback to help us improve. Let's collaborate to push the boundaries of multimodal AI!
Project mention: Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting | news.ycombinator.com | 2024-02-26
Project mention: A general representation modal across vision, audio, language modalities | news.ycombinator.com | 2023-05-25
Project mention: This research project on reconstructing video stimulus to the brain using an MRI scanner and AI algorithms reminds me of the RDA brain reading technology | /r/Avatar | 2023-06-23
Project mention: 25 million Creative Commons image dataset released! | /r/StableDiffusion | 2023-10-01Github: https://github.com/ml6team/fondant
Project mention: GRID: General Robot Intelligence Development Platform | news.ycombinator.com | 2023-10-17
Python foundation-models related posts
- Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting
- Show HN: Emu2 – A Gemini-like open-source 37B Multimodal Model
- 25 million Creative Commons image dataset released!
- Show HN: Autodistill, automated image labeling with foundation vision models
- [P] AI image generation without copyright infringement
- This research project on reconstructing video stimulus to the brain using an MRI scanner and AI algorithms reminds me of the RDA brain reading technology
- Autodistill: Use foundation vision models to train smaller, supervised models
-
A note from our sponsor - WorkOS
workos.com | 26 Apr 2024
Index
What are some of the best open-source foundation-model projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | ColossalAI | 37,836 |
2 | unilm | 18,319 |
3 | LLaVA | 16,101 |
4 | Otter | 3,441 |
5 | NExT-GPT | 2,860 |
6 | Ask-Anything | 2,663 |
7 | EVA | 1,957 |
8 | chronos-forecasting | 1,589 |
9 | autodistill | 1,529 |
10 | Emu | 1,491 |
11 | lag-llama | 942 |
12 | InternVideo | 909 |
13 | ONE-PEACE | 838 |
14 | meerkat | 811 |
15 | MindVideo | 348 |
16 | fondant | 319 |
17 | GRID-playground | 243 |
18 | meta-prompting | 29 |
Sponsored