CLIP
Visual Prompting
マルチモーダル
基盤モデル
映像基盤モデル
Vision Language Model
CLIP:言語と画像のマルチモーダル基盤モデル
https://trail.t.u-tokyo.ac.jp/ja/blog/22-12-02-clip/
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling
https://arxiv.org/abs/2409.19291
Exploring CLIP alternatives
https://www.elastic.co/search-labs/blog/openai-clip-alternatives
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
https://arxiv.org/abs/2502.14786