html2text
html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text.
Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).
HTMLをマークダウンにしてくれるらしい
markdownの逆
タグのstripをしたい場合は別のツールが適切か(gensimなど)