BERTScore算出例(『大規模言語モデル入門』)
Python 3.10.9
pip install bert-score evaluate
bert-score==0.3.13
code:bert_score.py
from pprint import pprint
import evaluate
def compute_bertscore(
predictions: liststr, references: liststr bertscore = evaluate.load("bertscore")
bertscore.add_batch(predictions=predictions, references=references)
results = bertscore.compute(lang="ja")
return {
k: sum(v) / len(v)
for k, v in results.items()
# resultsのkeyは precision, recall, f1, hashcode
if k != "hashcode"
}
reference = "日本語T5モデルの公開"
prediction1 = "T5モデルの日本語版を公開"
prediction2 = "日本語T5をリリース"
prediction3 = "Japanese T5を発表"
bertscore_results = {
}
pprint(bertscore_results)
code:実行結果
{'生成文1': {'f1': 0.8897445797920227,
'precision': 0.8771318793296814,
'recall': 0.9027253985404968},
'生成文2': {'f1': 0.8504709601402283,
'precision': 0.8785963654518127,
'recall': 0.8240904211997986},
'生成文3': {'f1': 0.8090397119522095,
'precision': 0.8341161608695984,
'recall': 0.7854270935058594}}
code:平均してみる.py
>> import numpy as np
>> np.average([result"f1" for result in bertscore_results.values()]) 0.8497517506281534
>> np.average([result"precision" for result in bertscore_results.values()]) 0.8632814685503641
>> np.average([result"recall" for result in bertscore_results.values()]) 0.8374143044153849
TODO:idfの考慮(論文読んで理解深める?)