CLIP
CLIP:言語と画像のマルチモーダル基盤モデル
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling
Exploring CLIP alternatives
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features