Set2setRank: Collaborative Set to Set Ranking for Implicit Feedback based Recommendation

#SIGIR2021 #SIGIR

Paper: https://arxiv.org/abs/2105.07377

Authors: Lei Chen, Le Wu, Kun Zhang, Richang Hong, Meng Wang

読み手： #totake

選んだ理由

問題設定がGunosyにおけるニュース記事推薦とほとんど同じで学びがありそう

Set2setRank -> 名前が気になる（集合をランキング...？）

TL; DR

推薦モデルの学習フレームワーク：Set2setRankを提案

提案法はimplicit feedbackを利用した協調フィルタリングに基づく推薦システムに汎用的に適用可能

提案法は実用に耐えうる計算量で性能にも優れる

/icons/hr.icon

Background

協調フィルタリングに基づく推薦手法はパフォーマンスに優れる傾向

初期の協調フィルタリングのモデルはexplicit user feedback（例：5段階評価）に基づいていた

しかし実応用の場面ではユーザーフィードバックは暗黙的（implicit feedback）であることが多い

例：クリックしたかどうか、購入したかどうか、レストランを訪れたかどうか

そのアイテムを「どのくらい良いと思ったか」という情報は得られない

近年こういったimplicit feedbackに基づく推薦システムの研究が盛ん

implicit feedbackに基づく推薦モデルの精度は様々な手法で向上してはいるが、まだ満足できるものではない

原因の一つ：ユーザーが観測するアイテムのスパース性

大量にあるアイテム候補の中で、ユーザーの目に触れるのはほんの一部

e.g., the density of most user-item interaction matrix is less than 1%

あるユーザーから得られるimplicit feedbackは2種類

Observed item set（少数）

ユーザーによって観測され、知覚可能なフィードバック（例：クリック）が与えられたアイテム集合

Unobserved item set（たくさん）

①ユーザーによって観測されたが、ネガティブなユーザー行動（例：クリックしない）をもたらしたアイテム集合

②ユーザーが観測しなかったアイテム集合

/icons/hr.icon

Motivation

協調フィルタリングに基づく既存手法の課題

既存手法①：Pairwise model

サンプルされたあるアイテムについて正例 or 負例を学習

e.g., Bayesian Personalized Ranking (BPR)

仮定：あるObserved itemはランダムに選ばれた Unobserved itemよりもユーザーに好まれる

課題：学習データに内在する構造的な情報（特定のアイテム間の関係）を無視している

For example, a user 𝑢 likes item 𝑖1 and 𝑖2, and has an unobserved item set of 𝑗1, 𝑗2, 𝑗3. By independently sampling two pairwise relations: 𝑖1 >𝑢 𝑗1 and 𝑖2 >𝑢 𝑗2, the ground truth of 𝑖2 >𝑢 𝑗1 is not considered due to the independence assumption

既存手法②：Listwise model

アイテムのランキングそのものを学習

e.g., ランキング評価指標を直接最適化 or 正例のランキングの生成確率を最大化

特徴：正解のランキング（の一部）を見てlossを計算できるので、①の課題は解消されている

問題：ランキング（順列）の生成確率やlossの計算コスト高

長さnのランキングはn!通り

ここを頑張る研究も多く存在

e.g., Oosterhuis'21 Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness (SIGIR 2021 best paper)

https://arxiv.org/abs/2105.00855

/icons/hr.icon

Key Idea：Set2setRank framework

気持ち：pairwise modelとlistwise modelの中間

アイテムではなくアイテム集合をサンプリングして学習する

Set2setRank: Overview

https://gyazo.com/c3255464bcec7462ed40588070a2f49d

最適化は2つのstepからなる

①: アイテム<->アイテム集合の比較&loss計算

目的：サンプルされたObserved itemに（同じくサンプルされた）Unobserved item setの任意のアイテムよりも高いスコアを与える

②: アイテム集合間の比較&loss計算

目的：Observed itemとUnobserved item setの中で最も"hard"なアイテムの間のマージンを大きくする

Set2setRank: Details

Notation

$ \mathbf{U}: ユーザー集合

$ \mathbf{V}：アイテム集合

$ \mathbf{R}= [r_{uv}]_{|U|\times |V|} \in \{0, 1\} ：user-iteminteractionmatrix

$ R_{u}^{+}where $ u \in \mathbf{U}: observed feedback

$ R_{u}^{-}=\mathbf{R}\setminus R_{u}^{+}: non-interactive feedback

$ \mathbf{\hat{R}}= [\hat{r}_{uv}]_{|U|\times |V|} : predicted preference score matrix

$ \mathbf{E}=\{\mathbf{E}_U, \mathbf{E}_V\}: user/item embeddings

$ \hat{r}_{uv}=e_{u}^{T}e_v

$ S_u^+ = \{\hat{r}_{ui}^{+}\mid i \in R_u^+\}, |S_u^+|=L

サンプリングしたobserved item set

$ S_u^- = \{\hat{r}_{uj}^{-}\mid j \in R_u^-\}, |S_u^-|=K

サンプリングしたunobserved item set

Preference Gain Function：2要素間の比較関数

$ D(x, y) =\sigma(x-y),$ x \in S_{u}^{+}, y \in S_{u}^{-}

$ \sigma():activation function.

Comparison function：集合$ Xとアイテム$ y_j \in Yの比較関数

https://gyazo.com/6420727e88b86cd3797e42ebbb436582

$ x_i \in Xについてsumをとるだけ

Item to Set Comparison：アイテム<->アイテム集合の比較関数

https://gyazo.com/804b2d769f7d8c5c781ac8e5e671123a

$ S_u^-の各要素についてsum

Set to Set Comparison：アイテム集合<->アイテム集合の比較関数

https://gyazo.com/d1e1436c149969156e8239b01a6bec93

イメージ：$ S_u^+の要素同士の距離の平均

https://gyazo.com/020f8c71d81bf8cb7b3d0aad3fbd604c

most "hard" sampleを選ぶ

https://gyazo.com/84b17eca8b30ea7714ed76ad949efb91

Objective Function

https://gyazo.com/dac0474eff87696b73ff243ffadd6b53

Adaptive Set2setRank：サンプリングの工夫

ユーザーによって$ R_{u}^{+}の大きさにばらつきがある

-> サンプルした$ S_{u}^{+}をランダムにマスクする

計算量について (sec 4.4)

lossの計算量

$ O(|U|M_{avg}K)

CLiMF >> SetRank > Set2setRank≒BPR

c.f. SetRank: $ O(|U|M_{max}(K+1))

ユーザーの観測する最大アイテム数（$ M_{max}）>> ユーザーの観測する平均アイテム数（$ M_{avg}）であることがキモ

e.g., 𝑀𝑚𝑎𝑥 = 6479 and 𝑀𝑎𝑣𝑔 = 101 on MovieLens-20M dataset

/icons/hr.icon

Experiments

Settings

Datasets

Amazon Books

MovieLens-20M

Yelp

Evaluation Metrics

NDCG

The HR measures the percentage of the liked items that are presented in the ranking list

Baselines

classical matrix factorization model (BPR)

graph-based recommendation model (LR-GCCF)

state-of-the-art graph-based recommendation model

CLiMF

typical listwise method for implicit feedback

Deep-SetRank

state-of-the-art list-wise learning model

Pop-Sampling

Pop-Sampling samples unobserved items based on the popularity of each item in the training data

AoBPR

adaptiveandcontext-dependent sampling distribution to build a non-uniform item sampler

Proposed method and variants

Set2setRank_BPR: BPR w/ Set2setRank

Set2setRank_GCN: GCN w/ Set2setRank

Set2setRank(A)_BPR: Set2setRank_BPR w/ adaptive Set2setRank ranking

Set2setRank(A) _GCN: Set2setRank_GCN w/ adaptive Set2setRank ranking

Results

Main results

https://gyazo.com/9ed4cb2b79a4f2531cfd6f0ce0793b11

全てのデータセットで提案法(Set2setRank_BPR(A))がベスト

Set2setRank_BPR(A) outperforms the best baselines by an average relative boost of 9%, 11% and 4% on Amazon Books MovieLens-20M, and Yelp, respectively

Influence of Set Size

https://gyazo.com/3c368d469348c9c9e138efb174d12bb3

we can observe that with the increase of 𝐾 and 𝐿, the results first increase and then decrease. The best setting is 𝐿 = 3,𝐾 = 50 fo rAmazonBookdataset, and 𝐿 = 3, 𝐾 = 50 for MovieLens-20M dataset.

Ablation Study

https://gyazo.com/e36b0496b7c85bec0538c216a0efa52e

https://gyazo.com/4d8d9a059a110be88b579bf9a852c013

item-to-set/set-to-set両方少しずつ効いていて、両方加味するとベスト

/icons/hr.icon

Discussion

Unobserved item set

①ユーザーによって観測されたが、ネガティブなユーザー行動（例：クリックしない）をもたらしたアイテム集合

②ユーザーが観測しなかったアイテム集合

①or②はimpログを見れば分かるので、この情報を明示的に負例のサンプリングに使うことはできそう

（負例を精緻にサンプルする）

（感想）

モチベーションが説得的、かつ結果も出ていて良い話

負例のサンプリングを頑張る意外にも、こういう方向性もあるのか...と気づかされた

丁寧なnotation、充実のrelated work、簡潔な記述と分かりやすい図で良い論文でした