PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change
https://arxiv.org/abs/2206.10498