[R] CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory + Code + Robot demo

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • clip-fields

    Teaching robots to respond to open-vocab queries with CLIP and NeRF-like neural fields

  • Best part, I believe, is that you should be able to train your own CLIP-Field for your living room if you have an hour, a decent GPU, and a way to get RGB-D video (an iPhone 13 Pro works great!) I hope you can give the code a try: https://github.com/notmahi/clip-fields or check out the website https://mahis.life/clip-fields/ for more interactive demos. Our Arxiv submission is also out now, at https://arxiv.org/abs/2210.05663, and if you want a longer tl;dr with a couple more videos, check out this tweet. Thanks!

  • Detic

    Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".

  • We made this using pretty recent advances in web-data pretrained models like Detic and LSeg for detection, CLIP for visual queries, and Sentence BERT for semantic queries. Our "database" is really a neural field (Instant NGP) that maps from 3D coordinates to a high dimensional embedding vector in the same representation space as CLIP and SBERT.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • lang-seg

    Language-Driven Semantic Segmentation

  • We made this using pretty recent advances in web-data pretrained models like Detic and LSeg for detection, CLIP for visual queries, and Sentence BERT for semantic queries. Our "database" is really a neural field (Instant NGP) that maps from 3D coordinates to a high dimensional embedding vector in the same representation space as CLIP and SBERT.

  • CLIP

    CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

  • We made this using pretty recent advances in web-data pretrained models like Detic and LSeg for detection, CLIP for visual queries, and Sentence BERT for semantic queries. Our "database" is really a neural field (Instant NGP) that maps from 3D coordinates to a high dimensional embedding vector in the same representation space as CLIP and SBERT.

  • sentence-transformers

    Multilingual Sentence & Image Embeddings with BERT

  • We made this using pretty recent advances in web-data pretrained models like Detic and LSeg for detection, CLIP for visual queries, and Sentence BERT for semantic queries. Our "database" is really a neural field (Instant NGP) that maps from 3D coordinates to a high dimensional embedding vector in the same representation space as CLIP and SBERT.

  • instant-ngp

    Instant neural graphics primitives: lightning fast NeRF and more

  • We made this using pretty recent advances in web-data pretrained models like Detic and LSeg for detection, CLIP for visual queries, and Sentence BERT for semantic queries. Our "database" is really a neural field (Instant NGP) that maps from 3D coordinates to a high dimensional embedding vector in the same representation space as CLIP and SBERT.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts