画像生成モデル

well-known models

DALL-E 2

Stable Diffusion

SDXL

ref. 世界を変えた画像生成AI、さらに進化「Stable Diffusion XL（SDXL）」いよいよ正式公開 - 週刊アスキー

Midjourny

NovelAI

ControlNet

other models

Muse_v1

MUSE_v1 - v1 | Stable Diffusion Checkpoint | Civitai

classes

diffusion model

ja: 拡散モデル、拡散確率モデル

現行の主流

denoise から生成してる。

GAN

VAE

Flow

comparison

fig.

https://res.cloudinary.com/zenn/image/fetch/s--JshVPvad--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_1200/https://lilianweng.github.io/lil-log/assets/images/generative-overview.png

用語

fine-tuning

LoRA

abbr. Low-Rank Adaptation

⊂ Adapter tuning

fine-tuning の一種

ref.

GitHub - cloneofsimo/lora: Using Low-rank adaptation to quickly fine-tune diffusion models.

LoRA Training Guide

LoRA - としあきdiffusion Wiki*

LoRAを使った学習のやり方まとめ！好きな絵柄・キャラクターのイラストを生成しよう【Stable Diffusion】 | くろくまそふと

text-to-image

abbr. text2image, t2i

image-to-image

abbr. i2i

depth-to-image

ref.

Stable Diffusion Depth-to-Imageモデルを学習なしで特定のドメインに適応させる

ControlNet

model: pause → image

量子化

8bitないし16bit整数でも計算できる。むしろ速い。

Quantization — PyTorch 2.0 documentation

BF16

bfloat16 の数値形式 | Cloud TPU | Google Cloud

fine-turing が速くなると言う。

https://twitter.com/alfredplpl/thread/1661853121679867904

@alfredplpl: 【悲報】今までStable Diffusion のファインチューニングにFP32を使っていたことに気づき、FP16に切り替えたところ3倍早くなった。今更。

@alfredplpl: 【朗報】BF16にしたらさらに早くなった。

補完

inpaint

outpaint

tooling

GUI

GitHub - AUTOMATIC1111/stable-diffusion-webui: Stable Diffusion web UI

reverse conversion: image → prompt

How to convert image to prompt?

Image to Prompt: Convert Img to text

app がない？

prompt engineering

methexis-inc/img2prompt – Run with an API on Replicate

GitHub - pharmapsychotic/clip-interrogator: Image to prompt with BLIP and CLIP

ref.

AI画像生成・生成系AI 問題まとめwiki

用語集 - AI画像生成・生成系AI 問題まとめwiki

拡散モデル - Wikipedia

What are Diffusion Models?￤Zenn

原語 (en): Diffusion Models

Civitai | Stable Diffusion models, embeddings, LoRAs and more

What the heck is Civitai? | Civitai

#machine_learning