FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction

2019/11/13

https://gyazo.com/6d6ece2b2f1867b84a587f2820d74414

Weibo is 中国のTwitter+Facebook的な存在

Abstract

FacebookやWeiboなどのインターネットカンパニーでは広告やフィードランキングは重要

そこではCTRが重要指標になる

CTR推定モデルはロジスティック回帰をはじめ数多ある

多くのモデルではimportance featuresを考慮せずにアダマール積や内積をシンプルに計算している

Feature Importance and Bilinear feature Interaction NETworkの略語であるFiBiNETを提案する

動的にfeature importance を計算し、fine-grainedな特徴の交互作用を考慮する

Squeeze-Excitation network (SENET) mechanismで動的にfeature importanceを計算

bilinear functionで効率的に交互作用を計算

2つの業務データで実験

FMやFFMより優れているのを示す

deep FiBiNETはDeepFM、XdeepFMよりも優れてた

INTRODUCTION

Facebook and Sina WeiboのようなインターネットカンパニーではCTR推定大事

LRとかPoly2とか factorization machinesとかいろいろな手法が提案されている

近年はXDeepFMとかいろいろなDeep系手法も提案されている

こんなところに良いブログが https://data.gunosy.io/entry/deep-factorization-machines-2018

FiBiNETは動的にfeature importance を計算し、fine-grainedな特徴の交互作用を考慮する

異なる特徴量はタスクごとに様々な重要性がある

例えば年収推定で趣味よりも職業のほうが重要度が高い（当たり前だ）

Squeeze-and-Excitation network (SENET)は動的に特徴の重みを学習する

CTR推定で特徴の交互作用考慮は大事なポイントだが多くはシンプルにアダマール積や内積の計算にとどまっている

bilinear functionを利用してfine-grainedな方法を提案する

contributions

CVのSENETをインスパイアして動的に特徴量の重みを学習する

特徴量の交互作用を計算するBilinear-Interaction layerの3種類を紹介する

SENETとbilinear feature interactionの考慮したshallowなモデルはCriteo and Avazu datasetsでFFMより優れた結果になった

shallow同士の比較

deep FiBiNETはCriteo and Avazu datasetsでdeepなstate-of-the-artな手法より優れた結果になった

RELATED WORK

Factorization Machine and Its relevant variants

Deep Learning based CTR Models

SENET Module

ILSVRC 2017 で優勝(ResNetより強い的な)

OUR PROPOSED MODEL

https://gyazo.com/8fa61b5ca2ecffaddc99e6f07a5320b1

Sparse Input and Embedding layer

sparse input layer と embedding layerはDeepなCTR推定モデルで広く使用されてるもの

sparse input layer は生特徴量のsparse な表現を行う（そのまま）

embedding layerはsparseなものをdenseにする

SENET Layer

特徴量ごとの重みを学習するのがねらい

Squeeze-and-Excitation Networks

3ステップある

https://gyazo.com/0144c6b507e1b165fb7a5ecf6f4002bb

Squeeze（圧縮）

各embeddingベクトルの統計量(max, mean)を求める（直訳）= 圧縮してグローバルな情報を求める

今回の実験ではmax poolingよりもmean poolingのほうがよかったらしい

SENETのオリジナルはmax pooling

https://gyazo.com/e9c10447e2752d51cd04cc21d179bd01

Excitation（励起）

ここで重みの学習する

２つのfull connected (FC) layers

次元削減のためのレイヤ

次元数を増やすためのレイヤ

https://gyazo.com/2948761422f4c6c503fcd989b5503054

ZがSqueezeから得られる統計ベクトル

W1, W2は各レイヤの重み

σ1, 2はアクティベーション関数

Re-Weight

もとのembeddingベクトルをre-scale

https://qiita-user-contents.imgix.net/https%3A%2F%2Fqiita-image-store.s3.ap-northeast-1.amazonaws.com%2F0%2F374063%2F7b83da8f-aab0-c0c8-4221-e793bbf59649.png?ixlib=rb-1.2.2&auto=compress%2Cformat&gif-q=60&s=fc56107ab5f89e9a1a75755b31ef127f

（参考）

Bilinear-Interaction Layer

内積はFMやFFMでアダマール積はDeep系で使用される

https://gyazo.com/d738e3a4e4da128cce3969c9e68e3183

https://gyazo.com/92f268bd71fa948b5a9a4a4d61898f37

https://gyazo.com/1a6151e0f19d18882c5d8a3d8c642256

https://gyazo.com/b66e3996b6861fad157cb4c155188cf1

Combination Layer

https://gyazo.com/bf55ae472aee9d107d3e650f59ecd061

Deep Network

https://gyazo.com/2291e8571cd5668f8a194e9a0bb2db93

Output Layer

https://gyazo.com/65b350f73325f0643bf8d541a20b2659

EXPERIMENTS

(RQ1) How does our model perform as compared to the state-ofthe-art methods for CTR prediction?

(RQ2) Can the different combinations of bilinear and Hadamard functions in Bilinear-Interaction layer impact its performance?

(RQ3) Can the different field types(Field-All, Field-Each and FieldInteraction) of Bilinear-Interaction layer impact its performance?

(RQ4) How do the settings of networks influence the performance of our model?

(RQ5) Which is the most important component in FiBiNET?

Experimental Testbeds and Setup

データセット

Criteo

Avazu

Evaluation Metrics

AUC

log loss

Baseline Methods

結果の表で

Implementation Details

パラメータとかの説明

（表現の仕方が上手い気がする）

Performance Comparison(RQ1)

https://gyazo.com/b174bf52c001741e07d75fa61b23a228

https://gyazo.com/0c4ea4f8ec5a1fc2a5bc6b29111ec0d9

（いつも思うんだけどデータのとり方が邪悪じゃない証明ってできないものか）

（差が僅差であることから勝つようにtrain/testのデータセットをうまく選んだだけでは？）

（モデルもソースコード公開してほしいけど、データのとり方も公開してほしい）

https://gyazo.com/fff978307d94ec5d6d3ad143c09a2bf3

（参考 xDeepFMの論文の結果 KDD2018）

Combinations of Bilinear-Interaction Layer(RQ2)

Bilinear-Interaction layerの効果測定

1: bilinear function使用

0: Hadamard product使用

first number: feature interaction method used on original embedding

second number: feature interaction method used on SENET-Like embedding

’10’ : bilinear function is used as feature interaction method on the original embedding while the Hadamard function is

used as feature interaction method on the SENET like embedding

https://gyazo.com/004f1ce2838e5b932c772f71d9a7cc83

Criteoではoriginal embeddingにアダマール積、SENET embeddingにアダマール積する方法がベスト

Avazuではoriginal embeddingにアダマール積、SENET embeddingにbilinear functionする方法がベスト

（だからなに的な、考察が足りてないぞ！）

（提案手法を切り分けて評価するところはいいんだけどね）

Field Types of Bilinear-Interaction (RQ3)

Bilinearの3種類の方法検証

https://gyazo.com/06db57d1c72a1988b472a06391eb1b91

Hyper-parameter Investigation(RQ4)

ハイパーパラメータの検証

（好感持てる！データセットに応じて最適なパラメータって変更すべきだよね）

https://gyazo.com/233f61ee4b5d66d2303fa13fa89f0e13

https://gyazo.com/7f5cb1153f53886879a050d6321b93a8

Ablation Study (RQ5)

https://gyazo.com/bee8292d5baeac5fa1c8c0f2f092af19

BASE: DeepSE-FM-Interaction

No BI: remove the Bilinear-Interaction layer from FiBiNET

No SE: remove the SENET layer from FiBiNET

（こういう比較良い）

CONCLUSIONS

所感

実験ちゃんとしてるし丁寧

考察もっとほしい感、結果の説明で終了してる

SENetなど既存の研究を応用してるところが好き

評価指標ってAUC, log lossだけでいいんだっけ？