Easy way to markdown Microsoft Word docs into Obsidian?

I would like to markdown whole word docs and have them in Obsidian. I am a total nube and have no coding experience, so when I’ve seen this topic discussed in the forum before, it’s all over my head. Is there any way to articulate this step by step how-to in an easy way for someone like me?

Thanks!

Here are a few ideas for your problem.

One way:

  1. Export the word document to html (save as …)
  2. Import html into Obsidian via copy & paste

An other way:

Try a browser extension, a obsidian or markdown Web clipper) to get the text out of the html file

A third way:

Try a html plugin:

1 Like

Thanks so much for these suggestions. I tried number one but it kept the docx formatting which replaces the bullets with weird characters and doesn’t translate the tabbed bullets correctly.

If I was going to use a plug in, what are the steps to do that? I installed that plug in and tried to follow the instructions but it failed.

I looked at a plugin called Pandoc that’s supposed to convert word doc to obsidian markdown, but I don’t know enough about how to install it (looks complicated).

Thanks for your help.

Also, check out https://www.writage.com/. I used the free trial a while ago helping someone convert a pile of .docx to .md and it worked fine.

2 Likes

I would install pandoc outside obsidian and at the command prompt do the conversion.


pandoc mydoc.docx -o mydoc.md

1 Like

Hi thanks. I don’t understand what this means. Could you explain please? (please assume I know nothing)

Beg for help of an advanced user who can help you with your PC.

  1. Install Pandoc from here: Pandoc - Installing pandoc and let it set the path (see checkbox)

  2. Press <win-r> in Windows 10

  3. Type in cmd

  4. Press <enter>

  5. Change to the folder where your word document (for example input.docx) lives by typing in the path to your document.

cd C:\Users\whateveryoursystemuses\Desktop

  1. Press <enter>

  2. Type or better copy & paste this command:

  3. pandoc --extract-media ./myMediaFolder input.docx -o output.md

  4. Press <enter>

The result is a file output.md made from your Word document input.docx. All of the included images are now in the folder myMediaFolder.

HTH

2 Likes

Thank so much for spending the time to write this! Unfortunatley, I use mac and when I downloaded the application and installed it, I can’t find it anywhere on my computer. It’s not in my applications folder. When I search for pandoc, nothing comes up. Would it be called something else? Sorry for the trouble.

Thanks! Unfortunately it’s for PC only. I’m mac.

I used the instructions here and it failed due to network issues (How to install pandoc on a mac · GitHub).

Ok so last attempt: does anyone know of a (secure) online tool to use that requires no software downloads?

Six months ago, I knew next to nothing myself.
I had 14000 pages of Word docs (on Windows) with various formatting (mainly bold text that came in handy to be changed to Wikilinks as they were basically prospective titles made for Obsidian) that needed to be converted to Markdown.

For me, Pandoc didn’t work out. My Word files were simply too badly formatted with lots of colours, bold and italics, indentations (sometimes 3-4 deep) etc. What’s worse, I was using Pages on iOS as well so everything was in shambles. Mostly the start and end points of bold text was shot to hell. Also, Pandoc didn’t not extract my images (the program was overwriting my image1, 2, etc. files and I couldn’t find a proper batch/Powershell code to help me).

So I went this way (my case is special because of the large volume of text):

I was looking around on various forums and getting help from admins (mostly retired Aussies) on how to make my VBA macros.
Mostly I figured everything out for myself as I went along.

I used macros to change my makeshift titles to Heading 1 and with that I created section breaks to break up each title to come to a new page. Then I saved all titles with the Heading 1’s as titlenames.
I needed to batch convert my docx files to txt files (to be easily renamed to md files later).
On one forum I received and tweaked a macro that identifies inline shapes (embedded pics) and changes the inner reference to Obsidian references (e.g. C:/...User/...Obsidian/Vaultname/A-Z folders/assets/activedocumentname_image1, 2, 3, etc.).
I was using another macro that extracted from each docx file all uncompressed pics (with the HTML conversion, only compressed pics are extracted).
In the main overhaul macro I changed bold text to Wikilinks, orange coloured texts to highlights and footnotes to endnotes, etc.
After doing all that I started learning some regexes and cleaning up my documents.
In Obsidian I used this plugin to find all broken links. More manual work…


If you ask the right questions on forums, you’ll get there one task at a time. If you have 500-1000 pages worth of Docs, I’d probably advise you to get down to it and do most things manually.

I remembered that one forum thread I was asking about on. If you follow my nick on that forum and other forums, you might be able to track down some of the steps.

This is no easy way. It was my way. I learnt the hard way. But as somebody else pointed out or implied, if you are hopeless on PC, you might need to buy someone’s help at your location.

Cheers

2 Likes

Once you have the command-line Pandoc app installed, you can use the open-source PanWriter markdown editor as a graphical front end. From the dev’s site: “Simply drag a .docx file onto the PanWriter app: it will be converted to Markdown and opened…”

1 Like

This is super helpful, thank you! I may need to go this way. My budget is small so I can’t pay anyone. So I’ll try your method!

First step: do you have any advice on how to start with the step you called “macros”? Any guides on that I can look at? Or maybe names of people I can ask for help?

Thank you!

Oh geez! I found this website, which is going to be 90% effective for me!

Putting it here for others who have my problem in the future: https://word2md.com/

A VBA Macro is used to make repititive tasks easier. You can run search/replace on MS Office files in folders/subfolders making changes in a batch.
You can start recording your own macro and see what code was generated. Believe me, in a few days you will understand how it all works without having learnt coding before.

Try looking around for similar questions and copy out the code the queriers got, then make some changes to it. Then ask your own question (word it in a way future queriers can benefit from any answers given) on forums that deal with VBA macros. When admins and other posters see you are trying to make your own code you pasted in, saying it didn’t work (but at least they gather you are interested to learn), they will help (for free).
With regexes, it is especially helpful to learn because it will make your life easier in the long run.

thanks for your answer. I don’t understand most of it.

But anyway… I am using the above link very effectively. There is one thing it does which is not perfect: instead of making a sub-bullet a [tab] - , it makes is a [space] - . Is there a way to automate a conversion of space bullet to tab bullet?

That’s how you use a VBA macro, use the computer’s power to speed things up: automate changing indents (tabs) to dashes (that will change into bullets in Obsidian).

You should start by learning Markdown syntax to be used in Obsidian first (so that your source writing is legit), only then you will know what automations you will need.