partition_pdf
https://unstructured-io.github.io/unstructured/core/partition.html#partition-pdf
#Unstructured
https://github.com/Unstructured-IO/unstructured/blob/0.12.6/unstructured/partition/pdf.py#L230
The default strategy auto will determine when a page can be extracted using fast mode, otherwise it will fall back to hi_res.
RAG用のデータ整備のために生まれただけあり、textはすばらしい(下の動画参照)
metadataのtext_as_htmlに日本語が入ってこないらしい(TODO ソース確認)
https://docs.unstructured.io/examplecode/codesamples/apioss/table-extraction-from-pdf#method-1%3A-using-partition-pdf
https://www.youtube.com/watch?v=E-tupjji22U