Vision Transformer
CLIP ViT-H/14 CLIP ViT-H/14
CLIP ViT-L/14
We release a new CLIP ViT-G/14 CLIP model with OpenCLIP which achieves 80.1% zero-shot accuracy on ImageNet and 74.9% zero-shot image retrieval (Recall@5) on MS COCO. As of January 2023, this is the best open source CLIP model.
https://t.co/TmVTUP3tBx
https://t.co/PMnpUUTNpc LAION
https://gyazo.com/8156059952ad2cb62654ce8040937d8d
https://huggingface.co/laion/CLIP-ViT-bigG-14-laion2B-39B-b160k
LAION
https://arxiv.org/abs/2302.05442
単にViTというと/motoso/Vision Transformerだと思う基素.icon