datasaurus
Segment-Everything-Everywhere-All-At-Once
datasaurus | Segment-Everything-Everywhere-All-At-Once | |
---|---|---|
1 | 6 | |
11 | 4,064 | |
- | 2.8% | |
7.2 | 7.9 | |
7 months ago | about 1 month ago | |
TypeScript | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
datasaurus
-
Is supervised learning dead for computer vision?
And let’s talk about development speed. By using text prompts to interact with your images, you can whip up a computer vision prototype in seconds. It’s fast, it’s efficient, and it’s changing the game.
So, what do you all think? Are we moving towards a future where foundational models take the lead in computer vision, or is there still a place for training models from scratch?
P.S. Shameless plug: I’ve been working on this open-source platform called Datasaurus https://github.com/datasaurus-ai/datasaurus) that taps into the power of vision-language models. It’s all about helping engineers get the insights they need from images, fast. Just wanted to share some thoughts and start a conversation. Let’s talk about the future of computer vision!
Segment-Everything-Everywhere-All-At-Once
-
Is supervised learning dead for computer vision?
Yes, you can. The model that I was talking about LLaVA only output text but other models such as SEEM (https://github.com/UX-Decoder/Segment-Everything-Everywhere-...) outputs a segmentation map. You could prompt the model "Where is the pickleball in the image?" and get a segmentation map that you could then use to compute its center. Please let me know if you would be interested to have SEEM available in Datasaurus
-
The less i know the better
I think people are just seeing the rate of progress and rightfully think that this stuff will be possible at some point. For the rotoscoping for example, here's an example of progress being made on that.
-
A robot showing off his moves
Yeah, it's definitely possible especially with all the recent advances. With segment anything systems (like SAM) and segmentation on NeRF reconstructions already being a thing the feasibility of this is more a time investment thing. Naive "scene understanding" is already possible in a few AR headsets at real-time, but the new papers in the past few weeks have made this much more trivial and faster to implement.
- Seem: Segment Everything Everywhere All at Once
-
[R] SEEM: Segment Everything Everywhere All at Once
Play with the demo on GitHub! https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once
What are some alternatives?
ai-health-assistant - An open source AI health assistant
segment-anything - The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
LLaVA - [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Segment-Everything-Everywhere-
squirrel-datasets-core - Squirrel dataset hub
deeplake - Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
guidance - A guidance language for controlling large language models.
LoRA - Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
squirrel-core - A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:
autodistill - Images to inference with no labeling (use foundation models to train supervised models).