open_flamingo
Emu
open_flamingo | Emu | |
---|---|---|
4 | 2 | |
3,493 | 1,510 | |
2.8% | 3.4% | |
6.8 | 7.4 | |
11 days ago | 2 months ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
open_flamingo
-
Are there any multimodal AI models I can use to provide a paired text *and* image input, to then generate an expanded descriptive text output? [D]
Maybe the recent OpenFlamingo gives you better results (they have a demo on HF).
- [D] Multi modal for visual qna based on a given image. Need suggestions.
- Open Flamingo: An open-source framework for training large multimodal models
-
Announcing OpenFlamingo: An open-source framework for training vision-language models with in-context learning | LAION
Code here: https://github.com/mlfoundations/open_flamingo
Emu
-
Show HN: Emu2 – A Gemini-like open-source 37B Multimodal Model
I'm excited to introduce Emu2, the latest generative multimodal model developed by the Beijing Academy of Artificial Intelligence (BAAI). Emu2 is an open-source initiative that reflects BAAI's commitment to fostering open, secure, and responsible AI research. It's designed to enhance AI's proficiency in handling tasks across various modalities with minimal examples and straightforward instructions.
Emu2 has demonstrated superior performance over other large-scale models like Flamingo-80B in few-shot multimodal understanding tasks. It serves as a versatile base model for developers, providing a flexible platform for crafting specialized multimodal applications.
Key features of Emu2 include:
- A more streamlined modeling framework than its predecessor, Emu.
- A decoder capable of reconstructing images from the encoder's semantic space.
- An expansion to 37 billion parameters, boosting both capabilities and generalization.
BAAI has also released fine-tuned versions, Emu2-Chat for visual understanding and Emu2-Gen for visual generation, which stand as some of the most powerful open-source models available today.
Here are the resources for those interested in exploring or contributing to Emu2:
- Project: https://baaivision.github.io/emu2/
- Model: https://huggingface.co/BAAI/Emu2
- Code: https://github.com/baaivision/Emu/tree/main/Emu2
- Demo: https://huggingface.co/spaces/BAAI/Emu2
- Paper: https://arxiv.org/abs/2312.13286
We're eager to see how the HN community engages with Emu2 and we welcome your feedback to help us improve. Let's collaborate to push the boundaries of multimodal AI!
- Code: https://github.com/baaivision/Emu/Emu2
What are some alternatives?
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Painter - Painter & SegGPT Series: Vision Foundation Models from BAAI
pykale - Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem. ⭐ Star to support our work!
instruct-eval - This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
speechbrain - A PyTorch-based Speech Toolkit
ColossalAI - Making large AI models cheaper, faster and more accessible
icl-ceil - [ICML 2023] Code for our paper “Compositional Exemplars for In-context Learning”.
LLMSurvey - The official GitHub page for the survey paper "A Survey of Large Language Models".