Substance - A HTML-to-Markdown extractor

reorx · January 17, 2023, 3:09pm

Hi all,

I’m creating a tool to extract the main content of the current web page and convert it to Markdown for archiving purposes. Currently, I’ve finished a Web app as a Proof-of-Concept or a preview version before the final release. It can be used to extract and download Wikipedia articles to markdown files. So here’s the link:

The goal of the product is to be an alternative to MarkDownload with more extensibility. MarkDownload has been a great help for archiving content from the web, but it does not always work well on every website. Every now and then, I found it gives bad results for some websites such as Wikipedia (that’s why I take it as an example to work on at the very beginning).

After releasing this web app, I’ll focus on developing the extension and writing documents for the product. The code is open-sourced here though it has no readme for now, but you can give me your feedback on the issues, or reply here if you like.

Ldjd · October 5, 2023, 10:22pm

This is awesome when will it be ready?