The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Vision_transformer Alternatives
Similar projects and alternatives to vision_transformer
-
pytorch-image-models
PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNet-V3/V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
ImageNet21K
Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper
-
fashion-200k
Fashion 200K dataset used in paper "Automatic Spatially-aware Fashion Concept Discovery."
vision_transformer reviews and mentions
-
Can I use CLIP to tag my picture collection?
And one last thing, should I even be thinking of using CLIP for these tasks when Google has released a better model here: https://github.com/google-research/vision_transformer/blob/main/model_cards/lit.md
-
When the client's management is happy but their dev team is a pain
Google's vision transformers are type hinted.
-
Improving Search Quality for Non-English Queries with Fine-tuned Multilingual CLIP Models
We’re going to look at a model that Open AI has trained with a broad multilingual dataset: The xlm-roberta-base-ViT-B-32 CLIP model, which uses the ViT-B/32image encoder, and the XLM-RoBERTa multilingual language model. Both of these are pre-trained:
-
[R] How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
JAX Code: https://github.com/google-research/vision_transformer
- [D] (Paper Overview) MLP-Mixer: An all-MLP Architecture for Vision
-
[P] Animesion: a framework, for anime (and related) character recognition. It uses Vision Transformers trained on a subset of Danbooru2018, that we rebranded as DAF:re, and can classify a given image into one of more than 3000 characters! Source code and checkpoints included.
For this project I used the pretrained models released by Google in Jax, using this particular PyTorch custom implementation. Those were pretrained on ImageNet21k with 14 M images among 21 K classes. Then yes I finetune on two datasets: one with 15 K images and 170 characters, and one with 3 K characters and almost 500 K images.
- Short term memory solutions for video tasks?
-
A note from our sponsor - WorkOS
workos.com | 25 Apr 2024
Stats
google-research/vision_transformer is an open source project licensed under Apache License 2.0 which is an OSI approved license.
The primary programming language of vision_transformer is Jupyter Notebook.
Popular Comparisons
- vision_transformer VS pytorch-image-models
- vision_transformer VS nerfstudio
- vision_transformer VS ImageNet21K
- vision_transformer VS Fashion12K_german_queries
- vision_transformer VS fashion-200k
- vision_transformer VS typeshed
- vision_transformer VS TorchSharp
- vision_transformer VS docarray
- vision_transformer VS beartype
- vision_transformer VS Pytorch
Sponsored