Wu+'20 MIND: A Large-scale Dataset for News Recommendation (ACL 2020)

Paper: https://www.aclweb.org/anthology/2020.acl-main.331/

Authors: Fangzhao Wu, Ying Qiao, Jiun-Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu, Ming Zhou

Dataset: https://msnews.github.io

#ACL2020 #totake

選んだ理由

Media MLでやっていることど真ん中のテーマで参考になる知見が多そう

最近のニュース記事推薦のトレンドをさらえそう

（※）は読み手の意見・感想です

TL; DR

ニュース記事の推薦システムのベンチマークに使用できる大規模かつ高品質なニュース記事のデータセットを作成した

1 million users

more than 160k English news articles

構築したデータセット上で種々のSOTAモデルを評価し、以下を確認した

モデルの性能は（1）ニュースコンテンツの理解（2）ユーザーの興味のモデル化、の2点に大きく依拠している

NLPのテクニックは重要

Motivation

ニュースの推薦システムは重要

毎日発生する大量のニュースを全て見ることはできない

パーソナライズはニュースを読むユーザ体験を向上させる (Wu et al., 2019b)

実際多くのプラットフォームでニュースの推薦システムが動いている

ニュース記事推薦と他ドメインの推薦との違い

ニュース記事はアップデートが早い

次々と最新のニュースが生まれ、過去のニュースはどんどんexpireしていく

ゆえに、コールドスタート問題がより大きな課題になる

ニュース記事はリッチなテキスト情報を持っている（タイトル/本文）

アイテムをシンプルなIDで表現することは適当でない

ユーザーからの明示的な評価（rating）が得られない

ユーザの興味は、ユーザーのクリック行動から暗黙的に予測される

（※）この問題意識は分野で共有されていそう

データセットの問題

ニュースの推薦は映画の推薦などに比べてあまりexploreされていない

主要な要因：データセットがない

人工知能の研究は、ベンチマークとなるデータセットの整備によって加速してきた

e.g., ImageNet, SQuAD

ニュース記事の推薦についてはこのようなベンチマークデータセットが無い -> これを作りたい

ニュース記事以外のドメインには存在：Amazon dataset, MovieLens dataset

ニュース記事のドメイン：社内のデータを使った実験を報告している場合が多く、publicに利用できない

Dataset Construction

データ&ログ収集

Microsoft News

https://gyazo.com/2e1da870913da463313f78f25d7fd6cc

Microsoft Newsのユーザーから、過去6週間で5クリック以上しているユーザーをランダムに100万人サンプリング

プライバシー保護のため、ユーザーIDはone-time salt mapping (https://en.wikipedia.org/wiki/Salt_(cryptography) )で匿名化

データセットの1レコードの形式：

$ (\mathrm{userID}, \mathrm{timestamp}, \mathrm{ClickHistory}, \mathrm{ImpressionLog}),

$ \mathrm{userID}:匿名化されたユーザーID

$ \mathrm{timestamp}:$ \mathrm{ImpressionLog}のtimestamp

$ \mathrm{ClickHistory}: ユーザーのクリック履歴

$ \mathrm{ImpressionLog}:= ((\mathrm{articleID_1}, label), (\mathrm{articleID_2}, label), ..),

$ label \in \{0, 1\}.

記事に関する情報

articleID

タイトル(title)/要約(abst)/本文(body)

カテゴリー情報（e.g., sports）-> 人手（編集者）で付与

title/abst/bodyに含まれるエンティティ（e.g., 人名）の情報

内製のNERモデルとentity linkingのモデルを使用し、NER抽出->WikiDataとリンク

knowledge triplesの情報

エンティティがWikiDataにリンクされているので、(head_entity, relation, tail_entity)の3つ組みが抽出できる

これらのknowledge triplesを用いてTransE (Bordes et al., 2013)を学習

TransE -> 知識ベース補完（Knowledge Base Completion）の一般的な手法

学習されたentity embedding, relation embeddingもデータセットに含めた

これらの情報はknowledge-aware news recommendationを見据えたもの

Stats

https://gyazo.com/482e576fb7f630c7d8a6564e1a65192c

（※）(d)興味深い

The survival time of a news article is estimated here using the time interval between its first and last appearance time in the dataset.

"appearance"がimpなのかclickなのか知りたいが...

既存データセットとの比較

https://gyazo.com/01d9be31ae40404c28416ac9cc2df582

Experimental Results

構築されたデータセット上で種々の推薦モデルを評価する

Candidate Models

General Recommendation Methods

LibFM (Rendle, 2012)

classic recommendation method based on factorization machine

DSSM (Huang et al., 2013)

deep structured semantic model

Wide&Deep (Cheng et al., 2016)

two-channel neural recommendation method

DeepFM (Guo et al., 2017)

popular neural recommendation method which synthesizes deep neural networks and factorization machines

News Recommendation Methods

DFM (Lian et al., 2018)

deep fusion model

GRU (Okura et al., 2017)

a neural news recommendation method which uses autoencoder

Yahoo! Japanの論文

DKN (Wang et al., 2018)

a knowledge-aware news recommendation method

entity embeddings 使う

NPA (Wu et al., 2019b)

personalized attention mechanism to select important words and news articles

NAML (Wu et al., 2019a)

attentive multi-view learn- ing to incorporate different kinds of news informa- tion

LSTUR (An et al., 2019)

long- and short-term user interests

https://gyazo.com/d673bc6e53d54fa06c47c71311341381

NRMS (Wu et al., 2019c)

multi-head self-attention to learn news representations

https://gyazo.com/8b05bedb370e25cd5f2b196dafdb803e

Main Results

https://gyazo.com/3f5cdb3e79f3d366f8e71ca71390205b

Findings

（1）ニュース推薦特化のモデルは、そうでないモデルに比べて高性能

end2endのニューラルモデルで表現学習しているため（cf. generalの方はhand crafted featuresに基づくものが多い）

（2）NRMS最強

multi-head self-attentiontが強い

次点でLSTUR

ユーザー興味の明示的なモデリングが効く

（2）未知のユーザーに限定しても精度がほとんど落ちない

汎化している

テキストの特徴量抽出

性能の優れていたNAML/LSTUR/NRMSの3モデルで、テキストの特徴量抽出の方法を変化させた場合の挙動

https://gyazo.com/22824f1d3032cb79dc903eb112795c47

Findings

neural text representationがやはり強力

Self-Att/LSTM > CNN

Self-Att/LSTMはCNNよりもlong-range contextsを捉えられるから強い

LSTM と attentionの組み合わせがベスト

pre-trained LMの威力

テキストの特徴量抽出にBERT (Devlin et al., 2019)を使う

https://gyazo.com/683a7e58470f1f39dba647c804f368a8

Findings

3つ全てのモデルで性能向上する

fin-tuneするとさらに上がる

word emb + LSTMで十分かと思いきやそうではない

abst/bodyも使用した場合

direct concatenation (denoted as Con)

attentive multi-view learning (denoted as AMV) (Wu et al., 2019a)

https://gyazo.com/0d62112205b01b100cbd497379970ba7

Findings

title/abst/body全部入れるとベスト

カテゴリラベル、エンティティの情報を考慮することでさらに少しずつ良くなる

（※）上がり幅は微妙だが...一貫して上がっているので効いていそう

ユーザーの興味モデリング

ユーザーの分散表現の作り方をいろいろ調査

simple average of the representations of previously clicked news (Average)

attention mechanism used in (Wu et al., 2019a) (Attention)

candidate-aware atten- tion used in (Wang et al., 2018) (Candidate-Att)

gated recurrent unit used in (Okura et al., 2017) (GRU)

long- and short-term user representation used in (An et al., 2019) (LSTUR)

multi-head self-attention used in (Wu et al., 2019c) (Self-Att)

https://gyazo.com/2d601b4c6ccc2aaa69051e9830def1e6

Findings

Attention, Candidate-Att, GRU > Average

Attention: ユーザー行動から選択的な特徴抽出ができる

Candidate-Att: 候補記事の情報を、手掛かりになるユーザー行動を選択する際に加味できる

GRU: 時系列の情報に強い

LSTUR：明示的にlong/short termのユーザー興味をモデル化している

Self-Att：ユーザ行動間の長期の関係をモデル化できる

コールドスタート問題

https://gyazo.com/f1dfbc2024baea9b772386ec0cae9355

Discussion

内製モデルの改善可能性

テキスト/ユーザの特徴量抽出両方参考になりそう

Self-Attention入れるなど

サムネ画像の特徴量を使うニュース記事推薦モデルは一般的でない（珍しい）？

モデルの出力したスコアと、ログから計算できる量（CTRなど）の組み合わせ方が結構重要な気がするが...

Microsoftではどうしているのだろうか