image_feature_extraction
GPT4RoI
image_feature_extraction | GPT4RoI | |
---|---|---|
1 | 1 | |
0 | 455 | |
- | - | |
0.0 | 4.6 | |
almost 3 years ago | 19 days ago | |
Python | Python | |
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
image_feature_extraction
-
Class or Functions?
It’s a little of both. You can check out https://github.com/giakou4/image_feature_extraction Each way I approach the issue, both ways seem correct to me. Classes seem a good way to organise but functions are easy for debugging.
GPT4RoI
-
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Instruction tuning large language model (LLM) on image-text pairs has achieved unprecedented vision-language multimodal abilities. However, their vision-language alignments are only built on image-level, the lack of region-level alignment limits their advancements to fine-grained multimodal understanding. In this paper, we propose instruction tuning on region-of-interest. The key design is to reformulate the bounding box as the format of spatial instruction. The interleaved sequences of visual features extracted by the spatial instruction and the language embedding are input to LLM, and trained on the transformed region-text data in instruction tuning format. Our region-level vision-language model, termed as GPT4RoI, brings brand new conversational and interactive experience beyond image-level understanding. (1) Controllability: Users can interact with our model by both language and spatial instructions to flexibly adjust the detail level of the question. (2) Capacities: Our model supports not only single-region spatial instruction but also multi-region. This unlocks more region-level multimodal capacities such as detailed region caption and complex region reasoning. (3) Composition: Any off-the-shelf object detector can be a spatial instruction provider so as to mine informative object attributes from our model, like color, shape, material, action, relation to other objects, etc. The code, dataset, and demo can be found at https://github.com/jshilong/GPT4RoI.
What are some alternatives?
fishington.io-bot - Fishington.io bot with OpenCV and NumPy
E2B - Secure cloud runtime for AI apps & AI agents. Fully open-source.
eulerian-remote-heartrate-detection - Remote heart rate detection through Eulerian magnification of face videos
vllm - A high-throughput and memory-efficient inference and serving engine for LLMs
pykaldi - A Python wrapper for Kaldi
InternGPT - InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
Fast-Poisson-Image-Editing - A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.
Woodpecker - ✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
towhee - Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
yolo-tf2 - yolo(all versions) implementation in keras and tensorflow 2.x
File-Injector - File Injector is a script that allows you to store any file in an image using steganography