Converting html (from Nimbus Notes) to markdown

What I’m trying to do

Trying to import html from Nimbus notes into Obsidian. A bit of a mess. Nimbus exports everything into zip files, and when you open the zip files it unzips to a folder named after the note, and that folder contains the html file, named note.html, and an assets folder.

I can import those into obsidian via importer but the problem is that importer doesn’t respect the folder structure. It just imports all html fles into markdown from all folders and subfolders and puts them into one folder. Really a mess. I want to get the markdown files into Obsidian in thei original folder structure from nimbus, with their resources in a folder named _resources, which contains the folder named after the note.

Things I have tried

I’ve accomplished this much already:

My original script opened the zips into their own html files with recursive folders and renamed the note.html file that was created by replacing “note” with the containing folder name. So each unzipped folder now contains a html file with its associated assets folder and/or another folder containing it’s own html file with its associated assets folder and/or another folder containing it’s own html file with its associated assets folder and/or another folder and so on, up to 7 levels deep.

I want to have a script that can do this (but have not been successful): I want to create a script using pandoc that will convert those html files into md (markdown) files with it’s associated assets that are placed in a folder named _resources, which is in the same folder as its associated md file.

I want all of those files will be in the same file structure within the source directory. So, the html files are not moved at this point, and the md conversions are in the same folder as the html file whence they came, but have their own _resources folder in that same folder.

Would love it if anyone could help me with this.

I’m just using scripts in the zsh shell.

Oh, and another thing. There’s something weird with the way pandoc converts the html file to md. All the text and images from the original nimbus notes are converted to md, but the md note also contains all sorts of gibberish: lines starting with three double-colons followed by text enclosed in curly brackets. It creates so much clutter as to render the note virtually unreadable. Is there any way to configure pandoc so it doesn’t do that?

The 3 colons etc. thing sounds like you may have some unwanted pandoc option switched on. Could you paste the command you’re using into a code block here?

“gfm-raw_html” to that line.

It appears as though the gibberish is gone, which is great!

Someone on StackOverflow provided also this explanation and suggestions to fix:

Those are a pandoc markdown extension, fenced divs. Apparently obsidian’s markdown dialect doesn’t support them. That’s fine, you can disable them by running pandoc with -t markdown-fenced_divs . In that case you may get some raw HTML div tags; to disable all of this you can use -t markdown-fenced_divs-native_divs-raw_html . Or you could try something like -t commonmark or -t gfm or -t markdown_strict . Pandoc supports many different markdown dialects.

Here’s the full thread with a copy of the script I’m using:

RESOLVED

Update:

For some reason my comment above got truncated and I cannot edit it anymore. In any case, the updated script that fixed that problem is below:

find "$root_dir" -name '*.html' -type f -exec sh -c 'pandoc "$1" -t gfm-raw_html --wrap=none -o "${1%.html}.md"' _ {} \;

Note that the -t gfm-raw_html piece fixed the gibberish from fenced divs.

There was still another problem though: text were getting messed up by frequent and random line-breaks. It seems pandoc replaces spaces with line-breaks after 72 and sometimes 80 characters (I’ve read inconsistent things about that). In any event, the --wrap=none piece fixed that as well.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.