Qi+’21 PP-Rec: News Recommendation with Personalized User Interest and Time-aware News Popularity (ACL 2021)

#ACL2021

Paper: https://aclanthology.org/2021.acl-long.424/

Authors: #Tao_Qi #Fangzhao_Wu #Chuhan_Wu #Yongfeng_Huang

#Chuhan_Wu さん：ニュース記事推薦のドメインで近年トップカンファレンスに論文を通しまくっている方

https://wuch15.github.io/

読み手： #totake

TL; DR

ニュース記事推薦にニュースの人気度合い（例：CTR）を組み込むフレームワークを提案

ニュース記事の人気度合いとユーザーの興味との一致に基づいて記事を推薦する

2種類のデータセットに対する実験で推薦精度と多様性が大幅に向上することを確認した

Motivation

既存のニュース記事推薦の一般的な仕組み

候補記事集合と過去の行動から推測されたユーザーの興味のマッチングに基づく

例：記事ベクトルとユーザーベクトル（例：過去にクリックした記事ベクトル平均）の内積でスコアリング

2つの既知の課題

コールドスタート

新規ユーザーは行動ログが少なく、精度の高い推薦をすることが難しい

Filter Bubble

過去に読んだ記事と似たものばかりが推薦される

目新しい情報は得られにくく、ユーザー体験を損ねる可能性

一方：ニュースの"人気度合い"が持つ情報に注目

社会的重要性（災害・疫病・選挙など）

ユーザーの興味に関係なく、こういったニュースはユーザーを惹きつける (Yang'16)

人気のニュースは多様性に富み、様々なトピックをカバーする (Houidi+’19)

こういったニュースの人気を考慮することで見込めること

コールドスタートにうまく対応できるようになる

推薦モデルが抱える低多様性の問題を緩和できる

Key Idea: PP-Rec

ニュースの人気度合いとターゲットユーザーとの興味の一致を両方考慮する

ニュースの人気度合い：様々な要素から影響を受けるもの

-> ニュースの内容・新しさ・リアルタイムCTRに基づいてtime-awareに予測する

ニュースの人気はユーザー行動に影響し、ユーザーの興味のモデリングにバイアスをもたらす恐れ

-> 人気バイアスの影響を軽減できるpopularity-aware user-encoderを提案

加えて記事に含まれるEntity（人名など）を考慮できる機構

PP-Rec: Overview

https://gyazo.com/4cb750d42928ff1a003bf5de3bbc57ee

$ s_m: personalized matching score

ニュース記事の分散表現とユーザーの分散表現を基に予測される

ニュース記事の分散表現：knowledge-aware news encoder (Fig.3)でエンコード

記事テキストとEntityの情報を使う

ユーザーの分散表現：popularity-aware user encoder (Fig.4)でエンコード

過去にクリックしたニュース記事とその記事の人気度合いの情報を使う

$ s_p: (time-aware) news popularity score

time-aware news popularity predictorで予測される

記事内容、記事の新しさ、リアルタイムCTRから予測

Knowledge-aware News Encoder

https://gyazo.com/967f7efffaa78ee662c0e34b1ad309aa

記事タイトルのテキストとEntityの情報を利用して分散表現を作る

テキスト：pre-trained word embeddings

Entity: pre-trained entity embeddings

Entity間の関係を考慮するためentity multi-head self-attention (MHSA)を導入

For example, the entity “MAC” that appears with the entity “Lancome” may indicate cosmetics while it usually indicates computers when appears with the entity “Apple”.

テキストの文脈を考慮するためEntity multi-head cross-attention network (MHCA)を導入

For example, the entity “MAC” usually indicates computers if its textual contexts are “Why do MAC need an ARM CPU?” and indicates cosmetics if its textual contexts are “MAC cosmetics expands AR try-on”.

最終的なEntityの表現はMHSAから得られた表現とMHCAから得られた表現の和

単語の方も同様にエンコードする

entity-based news representation $ \mathbf{e} (via entity attention network)

word-based news representation $ \mathbf{w}(via word attention network)

最終的な記事の表現$ \mathbf{n}: $ \mathbf{e}と$ \mathbf{w}の重み付き和

Time-aware News Popularity Predictor

記事内容・記事の新しさ・リアルタイムCTRから time-aware news popularityを予測

https://gyazo.com/ffd52fffc002db7ecb160e275a9d7f69

最終的な (time-aware) news popularity score: $ s_pは$ \hat{p}との重み付き和：

$ s_p=w_c \cdot c_t + w_p \cdot \hat{p}

重みは学習する

リアルタイムCTR $ c_t：ニュースの人気度合いは時間経過と共に減衰 -> 直近$ t時間の集計値を使う

$ \hat{p}: $ \hat{p_r}と$ \hat{p_c}の重み付き和

$ \hat{p_r}: recency-aware content-based news popularity

$ \hat{p_c}: content-based news popularity

異なる内容の記事は人気の減衰のしかたも異なるはず

content-specific aggregatorを学習して$ \hat{p_c}と$ \hat{p_r}を集約

https://gyazo.com/df379872d9f855a6fb56f628e0396c57

記事内容：上述のKnowledge-aware News Encoderで作成した表現$ \mathbf{n}

記事の新しさ（recency）：入稿されてからの経過時間をrecency embedding layerで一緒にencodeした表現$ \mathbf{r}

Popularity-aware User Encoder

https://gyazo.com/e3ed9d497ffc22554bad794d3f66ff3c

ニュース表現間でself-attentionをとる

i番目にクリックした記事のpopularity embedding, $ p_i

content-popularity joint attention network (CPJA)

https://gyazo.com/450184b691f6c882369a9fdd9d122490

$ \alpha_i: attention weight

$ \mathbf{m_i}: contextual news representation of the i-th clicked news

final user interest embedding

https://gyazo.com/f033c0965eb16fcc6013f869e2590b97

model training

personalized aggregator

https://gyazo.com/e63fc93f86f9a396d5a73fc4616187f4

$ \eta: $ \mathbf{u}から学習

ユーザーに応じて重みを変える

loss

BPR pairwise loss (Rendle et al., 2009)

https://gyazo.com/97253a0bb84c535660a7227333ace450

negative sampling technique

そのユーザーが（おそらく同一session内で）impした記事から負例をサンプル

Experiments

Dataset

MSN: Microsoft News website from October 19 to November 15, 2019

Feeds: commercial news feeds in Microsoft from January 23 to April 23, 2020

https://gyazo.com/b38d3e603085dec98c87196d286d4fdb

Evaluation metrics

AUC, MRR, nDCG@5, and nDCG@10

Resources

Glove embeddings (Pennington et al., 2014)

100-dimensional vectors pre-trained on WikiData via TransE (Bordes et al., 2013)

Baselines

popularity-based news recommendation methods

ViewNum (Yang, 2016)

using the number of news view to measure news popularity

RecentPop (Ji et al., 2020)

using the number of news view in recent time to measure news popularity

SCENE (Li et al., 2011)

using view frequency to measure news pop- ularity and adjusting the ranking of news with same topics based on their popularity

CTR (Ji et al., 2020)

using news CTR to measure news popularity

personalized news recommendation methods

EBNR (Okura et al., 2017)

utilizing an auto-encoder to learn news representations and a GRU network to learn user representations

DKN (Wang et al., 2018)

utilizing a knowledge-aware CNN network to learn news representations from news titles and entities

NAML (Wu et al., 2019a)

utilizing attention network to learn news representations from news title, body and category

NPA (Wu et al., 2019b)

utilizing personalized attention networks to learn news and user representations

NRMS (Wu et al., 2019e)

utilizing multi-head self-attention networks to learn both news and user representa- tions

LSTUR (An et al., 2019)

modeling users’ short-term interests via the GRU network and longterm interests via the user ID

KRED (Liu et al., 2020)

learning news representation from titles and entities via a knowledge graph attention network

Main Result

https://gyazo.com/c4501a3dc0d4d80043af5f8d4c642841

提案法が全指標でベスト

一貫してドラスティックに上がる

他の推薦モデルはCTRを使えていないので若干「それはそう」感はあるものの...

シンプルなCTR順は強力なベースラインになっている（パーソナライズなし）

最近の手法でもCTRに負けているものがあり興味深い

Cold-Start Users

https://gyazo.com/ed6c7173486d2d37ab1b787eb5374d89

popularity考慮の恩恵

多様性

https://gyazo.com/b97db778c3fc06151084f74f698ae042

popularity考慮の恩恵

クリック履歴にないトピックのニュースをどの程度含んでいるか

https://gyazo.com/5b513adf4a51bc696268425a7491dfce

popularity考慮の恩恵

Ablation Study

news popularity score/personalized matching scoreの効果

https://gyazo.com/eebbd6d73fe07d19eabfbf1a99c41e5e

両方効いている（片方をablationするだけでやや下がりすぎなような気も...）

time-aware news popularity predictorに注目

https://gyazo.com/8869ba3bc505ff506c067360d2bf1380

news recencyは必要

news contentも必要

やや不確実なリアルタイムCTRを補っている

リアルタイムCTRが最も重要

定性分析

https://gyazo.com/277c77a1176760b20a45664ea3575fe6

Discussion

ニュースの人気度合いと興味の一致度合いの重みをユーザーwiseに計算する仕組み

良さそう（できれば真似したい）

もっとシンプルにできると嬉しいが...（ヘビーユーザーは興味に振る、等）

大規模なKGが使用可能であることを前提としている

日本語の場合...？

実用上の問題

ユーザーベクトルは作っておくことが可能

Time-aware News Popularity Predictorも一定の間隔で定期実行で良さそう

推論時にはpersonalized aggregatorを動かすことが必要

リクエストが来たタイミングで必要な計算は大幅増

Ablationが欲しかった観点

Entityの考慮あり/なし

もっとシンプルな方法でaggregation（単にCTRと一律に重み付き和をとる等）

だめそうではある