Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today. Learn more →
Top 22 Python foundation-model Projects
-
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
Project mention: A Picture Is Worth 170 Tokens: How Does GPT-4o Encode Images? | news.ycombinator.com | 2024-06-07
Has anyone tried Kosmos [0] ? I came across it the other day and it looked shiny and interesting, but I haven't had a chance to put it to the test much yet.
[0] - https://github.com/microsoft/unilm/tree/master/kosmos-2.5
-
LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Project mention: Show HN: LLM Aided OCR (Correcting Tesseract OCR Errors with LLMs) | news.ycombinator.com | 2024-08-09This package seems to use llama_cpp for local inference [1] so you can probably use anything supported by that [2]. However, I think it's just passing OCR output for correction - the language model doesn't actually see the original image.
That said, there are some large language models you can run locally which accept image input. Phi-3-Vision [3], LLaVA [4], MiniCPM-V [5], etc.
[1] - https://github.com/Dicklesworthstone/llm_aided_ocr/blob/main...
[2] - https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#de...
[3] - https://huggingface.co/microsoft/Phi-3-vision-128k-instruct
[4] - https://github.com/haotian-liu/LLaVA
[5] - https://github.com/OpenBMB/MiniCPM-V
-
Otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
-
-
Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
-
Project mention: TimesFM (Time Series Foundation Model) for time-series forecasting | news.ycombinator.com | 2024-05-08
On a related note, Amazon also had a model for time series forecasting called Chronos.
https://github.com/amazon-science/chronos-forecasting
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
autodistill
Images to inference with no labeling (use foundation models to train supervised models).
-
Project mention: Show HN: Emu2 – A Gemini-like open-source 37B Multimodal Model | news.ycombinator.com | 2023-12-21
I'm excited to introduce Emu2, the latest generative multimodal model developed by the Beijing Academy of Artificial Intelligence (BAAI). Emu2 is an open-source initiative that reflects BAAI's commitment to fostering open, secure, and responsible AI research. It's designed to enhance AI's proficiency in handling tasks across various modalities with minimal examples and straightforward instructions.
Emu2 has demonstrated superior performance over other large-scale models like Flamingo-80B in few-shot multimodal understanding tasks. It serves as a versatile base model for developers, providing a flexible platform for crafting specialized multimodal applications.
Key features of Emu2 include:
- A more streamlined modeling framework than its predecessor, Emu.
- A decoder capable of reconstructing images from the encoder's semantic space.
- An expansion to 37 billion parameters, boosting both capabilities and generalization.
BAAI has also released fine-tuned versions, Emu2-Chat for visual understanding and Emu2-Gen for visual generation, which stand as some of the most powerful open-source models available today.
Here are the resources for those interested in exploring or contributing to Emu2:
- Project: https://baaivision.github.io/emu2/
- Model: https://huggingface.co/BAAI/Emu2
- Code: https://github.com/baaivision/Emu/tree/main/Emu2
- Demo: https://huggingface.co/spaces/BAAI/Emu2
- Paper: https://arxiv.org/abs/2312.13286
We're eager to see how the HN community engages with Emu2 and we welcome your feedback to help us improve. Let's collaborate to push the boundaries of multimodal AI!
-
-
Project mention: Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting | news.ycombinator.com | 2024-02-26
-
ONE-PEACE
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
-
-
-
-
Project mention: GRID: General Robot Intelligence Development Platform | news.ycombinator.com | 2023-10-17
-
Project mention: Open-source release of Aurora: a foundation model of the atmosphere | news.ycombinator.com | 2024-09-19
-
I'm impressed by how many of the new benchmarks that the Qwen team ran. As the old benchmarks get saturated/overfit, new ones are of course required. Some of the latest ones they use include:
* MMLU-Pro https://github.com/TIGER-AI-Lab/MMLU-Pro - a new more challenging (and improved in other areas) version of MMLU that does a better job separating out the current top models
* MixEval(-Hard) https://github.com/Psycoy/MixEval - a very quick/cheap eval that has high correlation w/ Chatbot Arena ELOs that can w/ (statistically correlated) dynamically swappable question sets
* Arena Hard https://github.com/lm-sys/arena-hard-auto - another automatic eval tool that uses LLM-as-a-Judge w/ high correlation w/ Chatbot Arena / human rankings
* LiveCodeBench https://livecodebench.github.io/ - a coding test with different categories based off of LeetCode problems that also lets you filter/compare scores by problem release month to see if the impact of overfitting/contamination
-
meta-prompting
Official implementation of paper "Meta Prompting for AI Systems" (https://arxiv.org/abs/2311.11482)
-
Lexicon3D
[NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Lexicon3D: This framework extracts features from various foundation models, constructs 3D feature embeddings as scene embeddings, and evaluates them on multiple downstream tasks. The paper presents a novel approach to representing complex indoor scenes using a combination of 2D and 3D modalities, such as posed images, videos, and 3D point clouds. The extracted feature embeddings from image- and video-based models are projected into 3D space using a multi-view 3D projection module for subsequent 3D scene evaluation tasks.
-
Project mention: Show HN: TF-GPT – a TensorFlow implementation of a decoder-only transformer | news.ycombinator.com | 2024-06-26
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python foundation-models discussion
Python foundation-models related posts
-
MT-Bench: Comparing different LLM Judges
-
Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting
-
Show HN: Emu2 – A Gemini-like open-source 37B Multimodal Model
-
25 million Creative Commons image dataset released!
-
Show HN: Autodistill, automated image labeling with foundation vision models
-
[P] AI image generation without copyright infringement
-
This research project on reconstructing video stimulus to the brain using an MRI scanner and AI algorithms reminds me of the RDA brain reading technology
-
A note from our sponsor - Scout Monitoring
www.scoutapm.com | 14 Oct 2024
Index
What are some of the best open-source foundation-model projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | ColossalAI | 38,699 |
2 | unilm | 19,719 |
3 | LLaVA | 19,655 |
4 | Otter | 3,560 |
5 | NExT-GPT | 3,241 |
6 | Ask-Anything | 3,012 |
7 | chronos-forecasting | 2,413 |
8 | EVA | 2,244 |
9 | autodistill | 1,897 |
10 | Emu | 1,629 |
11 | InternVideo | 1,338 |
12 | lag-llama | 1,219 |
13 | ONE-PEACE | 946 |
14 | meerkat | 824 |
15 | MindVideo | 364 |
16 | fondant | 339 |
17 | GRID-playground | 260 |
18 | aurora | 218 |
19 | MixEval | 211 |
20 | meta-prompting | 88 |
21 | Lexicon3D | 33 |
22 | tf-gpt | 2 |