unstructured
Unstructured-IO/unstructured: Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
MLサービス用の自然言語データの前処理ツール Unstructured を試す|npaka