ControlNet

https://github.com/lllyasviel/ControlNet

2302.05543 Adding Conditional Control to Text-to-Image Diffusion Models

10 Feb 2023

https://gyazo.com/e42578dd01e93bb1716bc52b3c6e3f68

https://gyazo.com/d5f0320ca7340720d568e4147577c94e

https://gyazo.com/d646926b60b693b65932ee48698ba8b7

Adding Conditional Control to Text-to-Image Diffusion Models

Lvmin Zhang(lllyasviel), Maneesh Agrawala

We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.

deepl.icon我々は、事前に学習した大規模拡散モデルを制御し、追加の入力条件をサポートするためのニューラルネットワーク構造、ControlNetを提示する。

ControlNetはタスクに特化した条件をエンドツーエンドで学習し、学習データセットが小さい（< 50k）場合でも学習はロバストである

ControlNetの学習は拡散モデルの微調整と同程度に高速であり、個人所有のデバイスで学習させることが可能である。

強力な計算クラスタがあれば、大量のデータ（数百万から数十億）に対してモデルを拡張することができます。

我々は、SDのような大規模拡散モデルをControlNetsで補強し、エッジマップ、セグメンテーションマップ、キーポイントなどの条件入力を可能にすることができると報告している。これにより、大規模拡散モデルを制御する手法を充実させ、関連するアプリケーションをさらに促進させることができると考えられる

新時代の pix2pix？ ControlNet 解説

Diffusion Model （特に Latent Diffusion）では入力の条件付けを忠実には考慮できていないモデルが多いため、本論文の結果はなかなかにインパクトがあります。個人的には GAN 時代の pix2pix がそのまま Diffusion モデルになってパワーアップした印象があります

勿論 Taming Transformer や Palette など Diffusion based な pix2pix モデルはありましたが今回のモデルは今広く使われている Stable Diffusion ベースとなっていることからも利便性が高いと思われます。（※ ControlNetは Stable Diffusion 以外にも適応可能です。）今後様々なタスクのベースとして使われる可能性がある

本論文では、Stable Diffusion のような大きな Diffusion モデルを個々のタスクに特化して学習が可能な ControlNet の紹介をします。この手法により Cannyエッジ、Hough line、人間の姿勢情報からの画像生成など、様々なタスクを学習することが可能となります。また、個人レベルのマシン（ex. Nvidia RTX3090Ti）で学習させた場合、大規模計算クラスタで学習したモデルにも引けをとらないモデルが学習できたようです

https://gyazo.com/67912174af8b08cfa458d939ed9db466

/work4ai/controlnet

応用

@TDS_95514874: ControlnetのNormalモードを使って、3Dモデルをイラスト、アニメ調に変換した後、元の3Dモデルにテクスチャとして貼り付けます

Normalモードは細部の構造がよく反映されるので、かなり正確にテクスチャリングが出来ます

少し調整すればシェイプキーすら流用できそうなレベル

＃aiart ＃b3d

https://video.twimg.com/ext_tw_video/1626287337310392327/pu/vid/720x422/ubc0Lx9bYQ_Cr_vm.mp4?tag=12#.mp4