New Tool for Migration from OneNote - updated and improved version

Hi Mada34,

Sorry, it’s been 30 days since your post. I’ve been offline for a while so you may have found a way to convert your HTML files.

The core of the HTML to markdown conversion in most programs is Pandoc.

I’ve never thought of doing a conversion like this, so thanks for the question.

This solution is just a first quick approach that worked on a couple of test HTML files I pulled from my Synology project. Hopefully, it works for everybody else :slight_smile:

Install Pandoc from https://pandoc.org/installing.html

Then unzip the attached bash file and put it in the folder with your HTML files.

html-md.sh.zip (700 Bytes)

At the command line change to the folder with the bash file in it and type
chmod 700 html-md.sh
This makes sure the file is executable
now at the command line type
html-md.sh

NOTE this should work recursively so any subfolders of the folder it is in should also get converted. If you have many folders with HTML notes in just put the html-md.sh file one folder lower and it will convert all those folders.

The original HTML files will still be there but there will be a new .md version of each one.

Please note that Pandoc has some limitations for example it does not convert checklists they will appear just as plain text. Any HTML that uses odd Microsoft formatting or class names may not appear either. Most of the work in the Synology HTML data is cleaning the HTML and recreating HTML without things Pandoc can not understand. This is where most programs making conversions struggle as you have to fix each issue before conversion, or manually convert it to markdown and add it back in after conversion (like I have to do with checklists).

If you find there are large parts not converted you may find another program somewhere, but I’ll add the HTML to md option to my current project for version 1 (link to project board) (which could be 3 or more weeks away it’s 90% done but not documented yet, and adding in the HTML will take a couple of days.

Anybody who finds some HTML that has come from OneNote that is not converted can share it with me and I’ll look at it to see if I can look into ways of cleaning/fixing it for version 2…

Version 2 would support getting the data out of OneNote as well but that is a long ways off with other things to do in between, but if anybody wants to collaborate on writing YANOM they can take a look at git hub.

Anybody reading this far might want to look at the YANOM project wiki page for a non-geek explanation of note conversions and how closed file systems like OneNote are so horrible compared to obsidian and why the YANOM project exists.

Good luck

Kevin