VLM
Vision Language Model のこと。
Text to Image や Image To Text
https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation
https://github.com/THUDM/CogVLM
https://www.reddit.com/r/LocalLLaMA/comments/18wyevu/best_uncensored_multimodal_vision_model/