ライブラリ「diffusers」を使って画像生成をおこなう

Web UIのセットアップがなかなか進まないので、プログラム上でdiffusionモデルを実装し、text2imgを体験してみる

11/20追記：img2imgもできるようになっています

11/21追記：マシン上でWeb UIのセットアップ完了！

4/16追記：LoRA、ControlNetの使い方を調査

/icons/hr.icon

text2imgについて

実行環境

Google Colaboratory

Stable-Diffusion-v1-4

python 3.10.12（!python --versionで確認できる）

※「ランタイムの変更」で「T4 GPU」を選択する

必要なライブラリをインストールする

pip install diffusers transformers accelerate omegaconf pytorch_lightning

accelerateで処理に必要な時間を短縮できるらしい

pytorchは自分のcudaにあったバージョンを調べてインストールする

プログラムでtext2imgしてみる

code:python

import torch

from diffusers import AutoPipelineForText2Image

model_path = "runwayml/stable-diffusion-v1-5"

pipe_t2i = AutoPipelineForText2Image.from_pretrained(

model_path, torch_dtype=torch.float16

).to("cuda")

# 画像の生成

prompt = "stable diffusion"

image = pipe_t2i(prompt=prompt, height=512, width=768).images0

# 画像の表示

display(image)

model_pathで任意のモデルを指定する（使えるモデルは限られているらしい）

promptでテキストを指定する. 上記のコードで指定したモデルは日本語非対応なので、英語で入力する必要がある

diplay()で画像を出力する

image.save("画像のパス") で画像の保存が可能

※上記のようにコードをScrap box上に記述するときは「code:言語名」と入力する

/icons/hr.icon

img2imgについて

※入力画像に対し、テキストでプロンプトを与えることで変換しています

実行環境

Google Colaboratory

Stable-Diffusion-v1-4

python 3.10.12

必要なものをインストールする

!pip install diffusers==0.12.1 transformers==4.19.2 ftfy accelerate

以下のコードを実行する：パイプラインを作成している

code:python

#パイプラインの作成

import torch

from torch import autocast

from diffusers import StableDiffusionImg2ImgPipeline

model_id = "CompVis/stable-diffusion-v1-4"

device = "cuda"

# パイプラインの作成

pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_id, revision="fp16", torch_dtype=torch.float16)

pipe = pipe.to(device)

変換したい画像を用意する

今回用意した画像は「shit1.jpg」

Googleドライブをマウントしておく

画像は「/content/drive/MyDrive/img」に置いておく（場所は任意）

以下のコードを実行する：img2imgを実行している

code:python

from PIL import Image

#一旦変更前の画像が出てくる

init_img = Image.open("/content/drive/MyDrive/img/shirt1.jpg")

init_img

#変換の実行

# プロンプト

prompt = "Full body image of the man wearing this shirt."

# パイプラインの実行

generator = torch.Generator(device).manual_seed(42) # 再現できるようにseedを設定

with torch.autocast("cuda"):

image = pipe(prompt, image=init_img, guidance_scale=7.5, strength=0.75, generator=generator).images0

# 変換した画像の保存

image.save("changed_shirt1.png")

変換後の画像は「/content」に保存されている

/icons/hr.icon

2024/04/16 追記

作成済みのLoRAモデルを、画像生成に適用できるようにする

LoRAモデルは./lora ディレクトリ内にまとめて保存しておく

参考にさせていただいたサイト

『yossymura_3d「Stable Diffusion（Diffusers）による画像生成の効率化と、基本的な使い方まとめ」』https://note.com/yossymura/n/n64b421ffd927

『GAMMASOFT「Stable Diffusionで画像をテキストで修正する方法」』https://gammasoft.jp/blog/image-to-image-with-stable-diffusion/

#yuma