Importing HTML documents into Obsidian

I’ve done a search here, I can’t find anything.

I’m wondering if there is any add-on to import the HTML documents that I’ve saved off the Internet, into Obsidian notes… so that when I search for keywords in Obsidian, I would be searching for said keywords throughout my HTML documents.


P.S. Just noticed this:

“There is now a function available for converting HTML to Markdown called htmlToMarkdown, which is using a pre-configured Turndown Service.”

But that’s in the Developer section of the latest release, not sure how that relates to everyday usage of Obsidian.

Well, as per my post here: Is it easier/better to convert .htm software documentation to .md or save PDFs and link manually? - #7 by N1755L

I’ve found out that with Markdownload, I am able to highlight text on a webpage, and then just drag that highlighted text straight into an Obsidian note, so I’ll mark this thread as solved.

Still I think it is better to archive stuff this way, preferably If we can save them into Zotero, highlight them there and link them. Then later use part of them in notes. Converting web-pages to markdown is not going to be perfect and even then the current solutions for it doesn’t do a good job at catching images, it is not reliably yet and needs micromanagement and double checking.

True, it’s not a good copy of the original html, which is why I save the html file as well, but what I really wanted was the ability to search through all my saved html for certain words, and this method I was mentioning will allow me to find information in my htmls… not sure if Zotero would do this though.

Not sure how you all feel about .pdf’s, but since the majority of the content I consume is from the web, I’ve started saving web pages to Zotero, along with a .pdf generated using this clipper for Chrome.

I can then annotate/highlight the document and export all the content to Obsidian using Zotfile/MDNotes. There’s a bit of a learning curve with those two, but it’s been well worth it in my opinion. I prefer the much lighter .pdf to the output from MarkDownload, although I do use that on occasion as well (mostly for small extractions, like @N1755L mentioned above).

The advantage with Zotero is that in addition to the .pdf, you can also save an .htm snapshot of the page for later review if needed. I’m much more confident that I captured everything from the page, and didn’t miss anything in a conversion.

I haven’t tried this, but since I export the annotations and notes into Obsidian, I haven’t had the need to.

The ability to search into files surely is very important, I think Zotero should add such a feature (I don’t think that it has it now). It is one reason why I am converting my important books into markdown and obsidian to have just that search ability in multiple files. But I don’t think it is practical for doing it for all the file. It is just going to mess up the vault. In my experiment obsidian is not a good reference management software and it is not intended to do kinds of stuff. So I think Zotero + Obsidian is way to go.

I hadn’t considered that when saving the htmls… I thought that when I saved the html, since it creates a folder associated with that html with all the page’s files, photos, etc., I thought that was good enough for a snapshot, but the other day, I was surprised to find that a html file I had saved that was no longer on the CBC website… when I opened my local copy, even if offline, I was not able to view the html properly.

Maybe a .pdf snapshot is a good idea. Especially with this Covid stuff of late, there have been instances where pages have changed… I was reading somewhere that this page had changed: The novel coronavirus’ spike protein plays additional key role in illness - Salk Institute for Biological Studies

So in such a case, having a snapshot of that page, on Zotero, before they made changes to it, I would have a copy of the original version on my hard drive, and that version would not change even if I fired up my local copy and it accessed the originating website, is that correct?

