Vision Transformer - work4ai

Vision Transformer

CLIP ViT-H/14 CLIP ViT-H/14

We release a new CLIP ViT-G/14 CLIP model with OpenCLIP which achieves 80.1% zero-shot accuracy on ImageNet and 74.9% zero-shot image retrieval (Recall@5) on MS COCO. As of January 2023, this is the best open source CLIP model.

https://t.co/TmVTUP3tBx

https://t.co/PMnpUUTNpc LAION

https://gyazo.com/8156059952ad2cb62654ce8040937d8d

https://huggingface.co/laion/CLIP-ViT-bigG-14-laion2B-39B-b160k

https://arxiv.org/abs/2302.05442

単にViTというと/motoso/Vision Transformerだと思う基素.icon