EDIT: I have forked the original script and improved it significantly, so this post is heavily edited.
The developer of this is too busy to make any changes, so I forked it and made many improvements:
can handle top-level sections, section groups and 1 level of nested section groups
prompt to choose between appending a prefix or creating subfolders for subpages
prompt to choose between 6 markdown formats, defaulting to pandoc which works best with obsidian
prompt to images all put in a central folder for each notebook or in same folder level as .md file
documents have relative references to the images
prompt to discard or keep the intermediate .docx files
can remove blank lines and “\” escape characters that are created by the converter
embedded/attached files (e.g. pdfs) stored in same media folder as images. Any symbols in file names are removed so they link correctly
and probably some other stuff as I continue to improve it.
Be sure to install Onetastic and the .xml macro that I included in order to automatically expand any collapsed paragraphs - they won’t be exported otherwise.
Feel free to check it out and submit any issues/comments/suggestions here or on github.
Hope this helps!
Edit:
I have passed the maintenance and development of this project on to someone else. You can get the latest and greatest version here.
Edit:
I’ve noticed an inherent limitation to this workflow - it exports from OneNote to Word, then converts the Word docs to Markdown. But Word has a limit of 9 bullet levels, so if you have more than that, the nesting gets garbled. I recommend “refactoring” any large notes in OneNote prior to export/migration.
Try running both OneNote and Powershell in Administrator mode (right click on the icon and select Run as Administrator). Also, ensure you have Microsoft Word installed. If you don’t I don’t think you can export to docx
Yeah, I tried converting some Chinese characters, but get the question marks as well. I have asked for some help on another forum and will let you know if I find a solution, but I’m not hopeful.
You could try converting with Evernote and Joplin - Evernote desktop app can easily import from OneNote and Joplin is another desktop app that can convert from Evernote to markdown. Some people use Evernote-> Notion->Markdown, but that requires uploading and a trial pro account, etc… Maybe just create a small sample notebook to test it on quickly.
Could you share an example of one of the notes that you are trying to convert? Perhaps manually export both a .one and .docx file from Onenote, and then attach the .md file that comes from the Pandoc script? You can put them all in a zip file and attach in a comment here.
Ok, I found the problem. It isn’t pandoc - its something with character encoding in Powershell. I tried to figure it out, but wasn’t able to.
Here’s a modified script that works with chinese characters, however it is less useful than the original as I had to get rid of some of the formatting functions (adding title and date, getting rid of double spaces etc…). The filenames for images won’t have the associated page name either, but they do seem to work. I hope this helps, but I’d still recommend checking out Onenote-> Evernote-> Joplin
Can this be solved by changing Powershell’s default character encoding?Or use other commands to change its default character encoding temporarily, like chcp 65001 ?When we are using Chinese Language as default language in Windows, Powershell’s default character encoding is used to be GB2312, not UTF-8.
Just to give some hints, I haven’t tried to solve this. Hope this could help you find some good solution out.
Pandoc converts it properly, and defaults to UTF8. My powershell is defaulted to UTF8 (this must be the default now, because I never changed it) and I tried using UTF16 as well.
The problem is that any of the functions that do a search/replace within the text (renaming the images, inserted files, and cleaning up some of the symbols/formatting) break with any Chinese (or Korean or Japanese) characters, even when I set it to encode with utf8 on reading from the file and writing back to the file. More specifically, the issue seems to be with any sort of function or read/write from a variable that is derived from the OneNote xml schema. When I simply write a line to the file with something like $var = “事件基本信息”, it writes properly, but $var = $page.name (which is in Chinese characters from the XML) doesn’t work. Interestingly, if I print that variable to the Console, it prints fine. But, again, even just search/replace for symbols like “\” breaks it when they are on a line with Asian characters. Its gotta be a mismatch with encoding between Powershell and the actual document.
Anyway, I spent a few hours on it and couldn’t get it to work and gave up and supplied a “fix” that doesn’t apply changes that break the conversion, which should be 90+% of the way there. I’m a hacker in the hackiest sense, so if you or someone else wants to/can fix it, please go for it and share the results/fix here or on the Github.
You could even try posting an issue at the original Github repo - that guy is a wizard, but I suspect he has heard more than enough from me (which is what spurred me to fork my own version to begin with).
Thanks for working on the get onenote to Obsidian.
I pasted the onenote2markdown script into powershell (run as administrator) and I got the message below at the end of the script run (I’m running the latest onenote and cannot seem to be able to open it as administrator).
That is odd. Did you literally copy and paste the script code into Powershell or did you run the script by navigating to the folder it is saved in and typing .\ConvertOnenote2Markdown-v2.ps1?
Also, I believe that it is necessary to have OneNote and Powershell either both run as administrator, or both NOT run as admin. Try using a normal Powershell terminal.