LLaVA-OneVision
https://llava-vl.github.io/blog/2024-08-05-llava-onevision/Project
https://github.com/LLaVA-VL/LLaVA-NeXTLLaVA-VL/LLaVA-NeXT
https://arxiv.org/abs/2408.03326LLaVA-OneVision: Easy Visual Task Transfer
https://gyazo.com/e9c1adf1180cbaa19b01213e3a824abb
単体画像・複数枚画像・ビデオの入力が可能なVLM
視覚エンコーダーにsiglip-so400m、LLMにQwen2