InstructPix2Pix

https://gyazo.com/4b87b01f72c7c352f3f0e96197e53072

17 Nov 2022 2211.09800 InstructPix2Pix: Learning to Follow Image Editing Instructions

Tim Brooks, Aleksander Holynski, Alexei A. Efros

University of California, Berkeley

We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image. To obtain training data for this problem, we combine the knowledge of two large pretrained models---a language model (GPT-3) and a text-to-image model (Stable Diffusion)---to generate a large dataset of image editing examples. Our conditional diffusion model, InstructPix2Pix, is trained on our generated data, and generalizes to real images and user-written instructions at inference time. Since it performs edits in the forward pass and does not require per-example fine-tuning or inversion, our model edits images quickly, in a matter of seconds. We show compelling editing results for a diverse collection of input images and written instructions.

DeepL: 人間の指示から画像を編集する手法を提案する

入力画像

何をすべきかを指示する文章

out

その指示に従い画像を編集する

本モデルの特徴

数秒という速さで画像を編集できる

フォワードパスで編集を行い、例ごとの微調整や反転を必要としないため

様々な入力画像や記述された命令に対して、説得力のある編集結果を示す

この問題の学習データを得るために、2つの大規模な事前学習済みモデル

言語モデル（GPT-3）

テキスト-画像モデル（Stable Diffusion）

を組み合わせて、大規模な画像編集例データセットを生成する。

生成されたデータを用いて学習した条件付き拡散モデルInstructPix2Pixは、推論時に実画像やユーザが書いた指示に対して汎化される

via https://twitter.com/_akhaliq/status/1593420088820092929

@birdMan710Nika: テキスト画像編集のInstructPix2Pix読んだ

GPT3+prompt2promptでデータセットを生成してから、入力画像をテキスト指示で編集するように学習してる

Imagicよりこの方法のほうが賢そうに見えるのでデモが楽しみ🥰

https://arxiv.org/pdf/2211.09800

https://pbs.twimg.com/media/Fh4TQDCVQAEnMLT.jpg https://pbs.twimg.com/media/Fh4TQb7UcAADb28.png https://pbs.twimg.com/media/Fh4TQ2RVQAEM82E.jpg

実装

https://github.com/timothybrooks/instruct-pix2pix