HTML/MD converter

This is not exactly a plugins request. It’s more like a request for a small utility to use alongside Obsidian. I’d love to have a non-programmer-friendly way to convert HTML files – preferably either individually or as a batch – to MD so I can put them in Obsidian.

If the same utility could also do EML to MD, that would be a bonus.

1 Like

One way to do it is to open the HTML file in a browser, select all, copy, and paste into Obsidian (with Settings > Editor > “Auto convert HTML” turned on).

Obviously this isn’t ideal for batches of files. Also the conversion isn’t always great, but that’s prolly true of any MD-to-HTML conversion to some extent.

1 Like

Yeah, there are lots of options for one-off files, but when exporting from other software, there are often dozens or even hundreds of files to deal with. :confused:

As a potential workaround for now, you might try the HTML Reader community plugin which enables opening of HTML files. I’m not sure if the files behave like HTML usually does in Source/Live/Reading or what.

I. As you say there are a lot of one off options …
" https://codebeautify.org/html-to-markdown"
… allows for speeding up the process a little. If your html file urls can be copied into in online file, you could do a regex search & replace all to add …
https://codebeautify.org/html-to-markdown?url=
in front of the urls … e.g.
https://codebeautify.org/html-to-markdown?url=https://obsidian.md/
https://codebeautify.org/html-to-markdown?url=https://obsidian.md/

II. Plugins
The Obsidian Pandoc Plugin might only do the opposite i.e. markdown TO html.
The Obsidian Enhancing Export Plugin I believe has the option of doing html TO markdown if you have your html files in your vault.

III. You could try Python …
How to Convert HTML to Markdown in Python

IV. Pandoc to the rescue?
Converting HTML to Markdown using Pandoc

1 Like

Oh yeah, Pandoc would do it.

1 Like

Hi. I just wrote a utility that converts HTML to Obsidian-style markdown, with math of various styles, within-doc link (i.e. [[#...]] and [[#^...]] stuff) and within-site hyperlink supports. It sits at this site: GitHub - kkew3/html2obsidian: Convert HTML to Obsidian-style Markdown, with math, within-doc link and within-site hyperlink supports.. I’ve uploaded some sample HTMLs and corresponding converted markdown at sample_html/ and sample_output/ directories there. You may see if it’s suitable for your use case. Thanks.

1 Like

Also the Obsidian team released a community plugin called Importer that can import HTML now.

3 Likes

there is a rather new cli utility here: GitHub - mrusme/reader: reader is for your command line what the “readability” view is for modern browsers: A lightweight tool offering better readability of web pages on the CLI.

reader is for your command line what the “readability” view is for modern browsers: A lightweight tool offering better readability of web pages on the CLI.

this can output markdown

binaries available directly on github