f1 scoreのシミュレーション

kaggleのコンテストで、以前、顧客の購入で次にどの商品が買われるか？を当てるcompetitionがあった。

modelとしては、次回購入時の、

予想商品点数

商品の予想購入率

この２つを独立？してモデルとして持つ際に、

まず、予想商品点数を出して、その商品点数分

100個の中から、１個、２個、みたいに選ばれるのはわかってる、つまり、選ばれた個数がわかってる場合で、

scikit-learnの該当項目

code:python

n_products = 100

def f1(ans, prd):

precision = len(prd.intersection(ans))/len(ans)

recall = len(ans.intersection(prd))/len(prd)

ret = 0 if precision == 0 or recall == 0 else 2 * (precision * recall)/(precision + recall)

return (ret, precision, recall)

import numpy as np

def rep_n(re_rate):

actual_bought = np.random.choice(products, int(re_rate * n_products))

monkey_choice = np.random.choice(products, int(re_rate * n_products))

return f1(set(actual_bought), set(monkey_choice))

import itertools

print("re_rate, f1, precision, recall")

for r in itertools.islice(itertools.count(start=0.01, step=0.01), 60):

ret0 = ret.copy()

ret0ret0==0 = np.nan

print(f're_rate:{round(r,2)}, score:{ret.mean(axis=0).round(3)}, score_no_zero:{np.nanmean(ret0, axis=0).round(3)}')