HumanEval
task_id: identifier for the data sample
prompt: input for the model containing function header and docstrings
"def return1():\n"
canonical_solution: solution for the problem in the prompt
" return 1"
test: contains function to test generated code for correctness
entry_point: entry point for test
IMO:どうやって評価しているんだろう?(文字列一致でよい?)