Evaluating the Rationale Understanding of Critical Reasoning in Logical Reading Comprehension

論文.icon

https://doi.org/10.18653/v1/2023.emnlp-main.9

Date

2023-02-01

Abstract

To precisely evaluate a language model's capability for logical reading comprehension, we present a dataset for testing the understanding of the rationale behind critical reasoning. For questions taken from an existing multiple-choice logical reading comprehension dataset, we crowdsource rationale texts that explain why we should select or eliminate answer options, resulting in 3,003 multiple-choice subquestions that are associated with 943 main questions. Experiments on our dataset show that recent large language models (e.g., InstructGPT) struggle to answer the subquestions even if they are able to answer the main questions correctly. We find that the models perform particularly poorly in answering subquestions written for the incorrect options of the main questions, implying that the models have a limited capability for explaining why incorrect alternatives should be eliminated. These results suggest that our dataset encourages further investigation into the critical reasoning ability of language models while focusing on the elimination process of relevant alternatives.

どんなもの?

先行研究と比べてどこがすごい?

技術や手法のキモはどこ?

どうやって有効だと検証した?

議論はある?

次に読むべき論文は?

Authors

Akira_Kawabata

Saku_Sugawara