robo-vln
DallEval
Our great sponsors
robo-vln | DallEval | |
---|---|---|
2 | 1 | |
61 | 133 | |
- | - | |
2.9 | 3.6 | |
10 months ago | 5 months ago | |
Python | Jupyter Notebook | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
robo-vln
-
[R] Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation
Project Webpage: https://zubair-irshad.github.io/projects/robo-vln.html Pytorch Code and Dataset: https://github.com/GT-RIPL/robo-vln ArXiv paper: https://arxiv.org/abs/2104.1067
DallEval
-
[N] [D] Openai, who runs DALLE-2 alleged threatened creator of DALLE-Mini
There are also other users of the DALL-E name: Sberbank's ruDALL-E or Kakao Brain's minDALL-E, or how about the benchmark DALL-Eval?
What are some alternatives?
hope-autonomous-driving - Autonomous Driving project for Euro Truck Simulator 2 Running on Real World
DALL-E - PyTorch package for the discrete VAE used for DALL·E.
ai-deadlines - :alarm_clock: AI conference deadline countdowns
dalle-mini - DALL·E Mini - Generate images from a text prompt
virtual_drawing_board - Virtual whiteboard with hand pose estimation
ru-dalle - Generate images from texts. In Russian
LAVIS - LAVIS - A One-stop Library for Language-Vision Intelligence
multimodal - A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal"
Mask3D - Mask3D predicts accurate 3D semantic instances achieving state-of-the-art on ScanNet, ScanNet200, S3DIS and STPLS3D.
ALPRO - Align and Prompt: Video-and-Language Pre-training with Entity Prompts
IntroDLPython - This repository is updated by a number of introductory projects to deep learning with Python.
conceptual-12m - Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.