JoyTagで出力したタグをMLLMで編集する
大体全てのMLLMは真面目なので、キャプションを書けと言っても、アニメキャラの名前を知らなかったり、NSFW系はまったく無視してきたりする
その点、WD14-taggerや新星JoyTagはかなり具体的にタグをつけてくれるものの、ただ単語を並べているだけで画像を説明しきれているのはいえない そこで、"JoyTagで生成したタグ"を使って"入力画像"の説明をしてください。とMLLMに頼むことで、両方のいいところを組み合わせる
https://gyazo.com/c5a70e3ebccb6d9d13b411bef873d60b
🟨JoyTagでタグ生成
🟩Edit the tags inside the “” as much as possible and generate a caption that includes the location information of the objects in the image. "というテキストを生成されたタグと合体させてInternLM-XComposer2-VLに渡しプロンプトを生成してもらう code:JoyTag
neon genesis evangelion, 1girl, souryuu asuka langley, blue eyes, solo, long hair, orange hair, dress, sky, day, hair between eyes, blue sky, outdoors, sitting, feet out of frame, short sleeves, blue dress, bangs, looking at viewer, parted lips, shirt, cloud, skirt, railing, from below, building, breasts, floating hair, blue skirt, rebuild of evangelion, open mouth, collarbone, border, looking down, black dress, stairs, white shirt, chromatic aberration, medium breasts, standing
code:Edited by InternLM-Xcomposer2-VL
The image portrays a girl from the anime series "Neon Genesis Evangelion" named Asuka Langley Souryuu. She has blue eyes and long orange hair. She is wearing a blue dress with short sleeves, her feet are out of frame, and she appears to be standing on stairs. The background shows buildings, a clear blue sky, and clouds, indicating that the scene takes place outdoors during daytime. Asuka's hair floats in the air, adding to the dramatic effect of the image. Her gaze is directed towards the viewer, and she has parted lips as if she is speaking or expressing surprise.
code:llava-v1.6-mistral-7b
The image shows an animated character, a young woman with long hair, standing on a staircase. She is wearing a dark dress and has her hair tied back. In the background, there's a cityscape with tall buildings under a blue sky. The character appears to be looking downwards, possibly towards the viewer or something below the stairs.
GPT-4かな?ってくらい聞き分けいい
その分とても重いlow_memory VRAM12GBでギリギリ
llavaはllama.cppの最適化がすごいという話でもあるので、なんとかggufに変換されてllama.cppで動くようにならないかなあぁ GPT-4の理解力に慣れきってプロンプトエンジニアリングの力を舐めていたかもしれないnomadoor.icon
code:prompt
This is a chat between an inquisitive human and an AI assistant. Assume the role of the AI assistant. Read all the images carefully, and respond to the human's questions with informative, helpful, detailed and polite answers.
The tags written after “Tag:” are descriptions of this image. Please rationally and concisely create a detailed caption using these tags as much as possible, including the location information of the subjects and the camera angle in the image. Tags:
ここにJoyTag
code:llava-v1.6-mistral-7b
In the image, we see Souryuu Asuka Langley, a character from the anime series Neon Genesis Evangelion. She stands confidently on a staircase, her long blonde hair cascading down her shoulders. Her attire consists of a blue dress paired with a white shirt, adding a pop of color against the backdrop of the cityscape. The sky above is a clear blue, suggesting it's a beautiful day outside. Her gaze is directed towards the viewer, creating an engaging connection with the audience. The camera angle gives us a unique perspective from below, emphasizing her height and the grandeur of the city around her. The building in the background adds depth to the scene, while the railing provides a sense of security and stability. The overall composition of the image captures the essence of Asuka's character, showcasing her strength and confidence in a vibrant urban setting.
https://gyazo.com/09c41afab76fed9eb48ffdd393729cc9