pdfminer.six
https://github.com/pdfminer/pdfminer.six
Pdfminer.six is a community maintained fork of the original PDFMiner.
extract_text
https://pdfminersix.readthedocs.io/en/latest/reference/highlevel.html#extract-text
読んだ結果が\x0c
スキャンされたPDFは読み取れない
https://github.com/pdfminer/pdfminer.six/issues/597#issuecomment-804797643
Unfortunately pdfminer does not support ocr.
👉
Unstructuredのpartition_pdfの実装
、つながった!