Python foundation-models

Open-source Python projects categorized as foundation-models

Top 22 Python foundation-model Projects

foundation-models
  • ColossalAI

    Making large AI models cheaper, faster and more accessible

    Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22
  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • unilm

    Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

    Project mention: A Picture Is Worth 170 Tokens: How Does GPT-4o Encode Images? | news.ycombinator.com | 2024-06-07

    Has anyone tried Kosmos [0] ? I came across it the other day and it looked shiny and interesting, but I haven't had a chance to put it to the test much yet.

    [0] - https://github.com/microsoft/unilm/tree/master/kosmos-2.5

  • LLaVA

    [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

    Project mention: Show HN: LLM Aided OCR (Correcting Tesseract OCR Errors with LLMs) | news.ycombinator.com | 2024-08-09

    This package seems to use llama_cpp for local inference [1] so you can probably use anything supported by that [2]. However, I think it's just passing OCR output for correction - the language model doesn't actually see the original image.

    That said, there are some large language models you can run locally which accept image input. Phi-3-Vision [3], LLaVA [4], MiniCPM-V [5], etc.

    [1] - https://github.com/Dicklesworthstone/llm_aided_ocr/blob/main...

    [2] - https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#de...

    [3] - https://huggingface.co/microsoft/Phi-3-vision-128k-instruct

    [4] - https://github.com/haotian-liu/LLaVA

    [5] - https://github.com/OpenBMB/MiniCPM-V

  • Otter

    🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

  • NExT-GPT

    Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

  • Ask-Anything

    [CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

  • chronos-forecasting

    Chronos: Pretrained (Language) Models for Probabilistic Time Series Forecasting

    Project mention: TimesFM (Time Series Foundation Model) for time-series forecasting | news.ycombinator.com | 2024-05-08

    On a related note, Amazon also had a model for time series forecasting called Chronos.

    https://github.com/amazon-science/chronos-forecasting

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • EVA

    EVA Series: Visual Representation Fantasies from BAAI (by baaivision)

  • autodistill

    Images to inference with no labeling (use foundation models to train supervised models).

    Project mention: Ask HN: Who is hiring? (October 2024) | news.ycombinator.com | 2024-10-01
  • Emu

    Emu Series: Generative Multimodal Models from BAAI (by baaivision)

    Project mention: Show HN: Emu2 – A Gemini-like open-source 37B Multimodal Model | news.ycombinator.com | 2023-12-21

    I'm excited to introduce Emu2, the latest generative multimodal model developed by the Beijing Academy of Artificial Intelligence (BAAI). Emu2 is an open-source initiative that reflects BAAI's commitment to fostering open, secure, and responsible AI research. It's designed to enhance AI's proficiency in handling tasks across various modalities with minimal examples and straightforward instructions.

    Emu2 has demonstrated superior performance over other large-scale models like Flamingo-80B in few-shot multimodal understanding tasks. It serves as a versatile base model for developers, providing a flexible platform for crafting specialized multimodal applications.

    Key features of Emu2 include:

    - A more streamlined modeling framework than its predecessor, Emu.

    - A decoder capable of reconstructing images from the encoder's semantic space.

    - An expansion to 37 billion parameters, boosting both capabilities and generalization.

    BAAI has also released fine-tuned versions, Emu2-Chat for visual understanding and Emu2-Gen for visual generation, which stand as some of the most powerful open-source models available today.

    Here are the resources for those interested in exploring or contributing to Emu2:

    - Project: https://baaivision.github.io/emu2/

    - Model: https://huggingface.co/BAAI/Emu2

    - Code: https://github.com/baaivision/Emu/tree/main/Emu2

    - Demo: https://huggingface.co/spaces/BAAI/Emu2

    - Paper: https://arxiv.org/abs/2312.13286

    We're eager to see how the HN community engages with Emu2 and we welcome your feedback to help us improve. Let's collaborate to push the boundaries of multimodal AI!

  • InternVideo

    [ECCV2024] Video Foundation Models & Data for Multimodal Understanding

  • lag-llama

    Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting

    Project mention: Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting | news.ycombinator.com | 2024-02-26
  • ONE-PEACE

    A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

  • meerkat

    Creative interactive views of any dataset.

  • MindVideo

    Official code base for MinD-Video

  • fondant

    Production-ready data processing made easy and shareable

  • GRID-playground

    Platform for General Robot Intelligence Development

    Project mention: GRID: General Robot Intelligence Development Platform | news.ycombinator.com | 2023-10-17
  • aurora

    Implementation of the Aurora model for atmospheric forecasting (by microsoft)

    Project mention: Open-source release of Aurora: a foundation model of the atmosphere | news.ycombinator.com | 2024-09-19
  • MixEval

    The official evaluation suite and dynamic data release for MixEval.

    Project mention: Qwen2 LLM Released | news.ycombinator.com | 2024-06-07

    I'm impressed by how many of the new benchmarks that the Qwen team ran. As the old benchmarks get saturated/overfit, new ones are of course required. Some of the latest ones they use include:

    * MMLU-Pro https://github.com/TIGER-AI-Lab/MMLU-Pro - a new more challenging (and improved in other areas) version of MMLU that does a better job separating out the current top models

    * MixEval(-Hard) https://github.com/Psycoy/MixEval - a very quick/cheap eval that has high correlation w/ Chatbot Arena ELOs that can w/ (statistically correlated) dynamically swappable question sets

    * Arena Hard https://github.com/lm-sys/arena-hard-auto - another automatic eval tool that uses LLM-as-a-Judge w/ high correlation w/ Chatbot Arena / human rankings

    * LiveCodeBench https://livecodebench.github.io/ - a coding test with different categories based off of LeetCode problems that also lets you filter/compare scores by problem release month to see if the impact of overfitting/contamination

  • meta-prompting

    Official implementation of paper "Meta Prompting for AI Systems" (https://arxiv.org/abs/2311.11482)

    Project mention: Meta Prompting for AGI Systems | news.ycombinator.com | 2024-02-29
  • Lexicon3D

    [NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding

    Project mention: Voxel51 Filtered Views Newsletter - September 20, 2024 | dev.to | 2024-09-20

    Lexicon3D: This framework extracts features from various foundation models, constructs 3D feature embeddings as scene embeddings, and evaluates them on multiple downstream tasks. The paper presents a novel approach to representing complex indoor scenes using a combination of 2D and 3D modalities, such as posed images, videos, and 3D point clouds. The extracted feature embeddings from image- and video-based models are projected into 3D space using a multi-view 3D projection module for subsequent 3D scene evaluation tasks.

  • tf-gpt

    A TensorFlow implementation of GPT.

    Project mention: Show HN: TF-GPT – a TensorFlow implementation of a decoder-only transformer | news.ycombinator.com | 2024-06-26
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python foundation-models discussion

Log in or Post with

Python foundation-models related posts

  • MT-Bench: Comparing different LLM Judges

    2 projects | dev.to | 8 Jun 2024
  • Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting

    1 project | news.ycombinator.com | 26 Feb 2024
  • Show HN: Emu2 – A Gemini-like open-source 37B Multimodal Model

    1 project | news.ycombinator.com | 21 Dec 2023
  • 25 million Creative Commons image dataset released!

    1 project | /r/StableDiffusion | 1 Oct 2023
  • Show HN: Autodistill, automated image labeling with foundation vision models

    1 project | news.ycombinator.com | 6 Sep 2023
  • [P] AI image generation without copyright infringement

    1 project | /r/MachineLearning | 29 Jun 2023
  • This research project on reconstructing video stimulus to the brain using an MRI scanner and AI algorithms reminds me of the RDA brain reading technology

    1 project | /r/Avatar | 23 Jun 2023
  • A note from our sponsor - Scout Monitoring
    www.scoutapm.com | 14 Oct 2024
    Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today. Learn more →

Index

What are some of the best open-source foundation-model projects in Python? This list will help you:

Project Stars
1 ColossalAI 38,699
2 unilm 19,719
3 LLaVA 19,655
4 Otter 3,560
5 NExT-GPT 3,241
6 Ask-Anything 3,012
7 chronos-forecasting 2,413
8 EVA 2,244
9 autodistill 1,897
10 Emu 1,629
11 InternVideo 1,338
12 lag-llama 1,219
13 ONE-PEACE 946
14 meerkat 824
15 MindVideo 364
16 fondant 339
17 GRID-playground 260
18 aurora 218
19 MixEval 211
20 meta-prompting 88
21 Lexicon3D 33
22 tf-gpt 2

Sponsored
Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com

Did you konow that Python is
the 1st most popular programming language
based on number of metions?