Measuring and Mitigating Unintended Bias in Text Classification

どんなモノ

Toxicity Classificationにおける意図しないBiasが存在することを突き止めてdebiasしていく研究

Unintended Bias

Every machine learning model is designed to express abias. For example, a model trained to identify toxic com-ments is intended to be biased such that comments that aretoxic get a higher score than those which are not. The modelis not intended to discriminate between the gender of thepeople expressed in a comment - so if the model does so,we call that unintended bias. We contrast this with fairnesswhich we use to refer to a potential negative impact on so-ciety, and in particular when different individuals are treateddifferently.

Unintended Biasはデータセットの不均衡によって生じる

「攻撃性の高いメッセージを分類する」というモデルで、教師データである攻撃性の高いメッセージが特定のindentityを持つ人間に対して発されたモノが多くて、identity termsに意図せず攻撃性のバイアスがついてしまう

これらを避けるために、それらのidentityを持つ単語に対して、non-toxicなラベルが対応する攻撃性の高くないsentenceのデータを追加する

Unsupervisedにやっている

We mined the new data from Wikipedia articles them-selves. Since the text comes from the published article, weassume that the text is non-toxic, which we validated bylabeling 1000 comments, 99.5% of them were confirmednon-toxic. Using unsupervised, assumed non-toxic articledata enables the data balancing to be done without addi-tional human labeling.

先行研究と比べてどこがすごい？

Biasの存在を測るためのmetricsを提案した

技術や手法のキモはどこ？

どうやって有効だと検証した？

1. trainデータであるwikipediaコメントのholdout set

パフォーマンスの低下をみる

2. Identity Phrase Templates Test Set

indentity間でパフォーマンスに差がないかをみる

3. AUC

議論はある？

次に読むべき論文は？