torch_geometricの心得

PyGでは，sparseな表現とdenseな表現の二つを行き来することが重要

一般に，グラフのノード数$ V・エッジ数 $ Eは各サンプルごとに異なる

ミニバッチでは可変の$ V, Eをpaddingして扱うのではなく，下記の通り，単にガッチャンコした形式で扱う (sparseな表現)

$ A = \text{diag}(A_1, \ldots, A_n), \quad X = \begin{pmatrix} X_1 \\ \vdots \\ X_n \end{pmatrix}, \quad Y = \begin{pmatrix} Y_1 \\ \vdots \\ Y_n \end{pmatrix}

上のようにガッチャンコされた$ X, Yの各要素が，ミニバッチ内のどのグラフから取ってきたモノなのかは$ \mathrm{batch}として保持されている

$ \mathrm{batch} = [0 \ \cdots \ 0 \ 1 \ \cdots \ n-2 \ n-1 \ \cdots \ n-1 \rbrack ^{\top}

この$ \mathrm{batch}を使うことで，$ V, Eに関してpaddingされたdenseな表現 $ \mathbb{R}^{B \times V_\mathrm{pad} \times d}, \mathbb{R}^{B \times V_\mathrm{pad} \times V_\mathrm{pad}}に変換することが可能

to_dense_batch

ノード特徴量をdenseな表現$ \mathbb{R}^{B \times V_\mathrm{pad} \times d}に変換

to_dense_adj

エッジ特徴量をdenseな表現$ \mathbb{R}^{B \times V_\mathrm{pad} \times V_\mathrm{pad}}に変換

tips: 新たな特徴量を作る場合は，dataloader (dataset)にsparseな表現で格納する必要がある．基本は常にsparse．モデルのforward内でdenseな表現を使いたい時は上記to_dense_*を使えば良い

余談

Attentionを適用する際はdenseな表現を使う (ことが多い)

MPNN型GNNを使う際はsparseな表現を使う (ことが多い)

AttentionとMPNN両者を使う場合

$ h_\mathrm{sparse}

$ → h_\mathrm{dense} := f(h_\mathrm{sparse})

$ → h'_\mathrm{dense} = \mathrm{attn}(h_\mathrm{dense})

$ → h'_\mathrm{sparse} = f^{-1}(h'_\mathrm{dense}) + \mathrm{MPNN}(h_\mathrm{sparse})

Recipe for a General, Powerful, Scalable Graph Transformer

rampasek/GraphGPS