CLIP - yuyan

CLIP

CLIP：言語と画像のマルチモーダル基盤モデル

CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling

Exploring CLIP alternatives

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features