This is something that I quickly hacked together for someone in the community who needed to batch convert a whole bunch of docx files into markdown on windows. I found a couple of GUIs for pandoc but none of them could process docx files in batch for some reason, so I decided to do it with PowerShell instead.
There are a lot of room for improvement but I’m leaving this here first before I forget about it. I’ll come up a bash/zsh compatible version too at a later date for macOS & Linux users.
Reboot your computer if this is windows (the installer updates $PATH on windows and a reboot is required to take effect)
Make a new folder, and make a copy of your documents to this folder. Hold down shift and right click an empty area in the file explorer and click on “Open PowerShell here”
Try running pandoc --help, if you get a bunch of help text, then pandoc is properly installed. Paste the following into the console and it should be able to convert all of the docx files in this folder to markdown.
The pandoc manual has a comprehensive list of all supported formats, simply change the filtering parameter -Filter *.docx and --from docx argument to your source format if you want to convert something that obsidian can’t import directly: https://pandoc.org/MANUAL.html
Thank you.
Works well (within the limits of pandoc).
I tested it with
docx file with table (interesting visual for row in italics )
epub War and Peace (too big for Typora, handled fine by Obsidian if somewhat slow loading; some glitches on chapter headings and Index).
A few glitches in complex documents are normal in pandoc, and I have to remember it uses markdown links, but this was very impressive for the script and Obsidian.
I have encountered a problem, when dealing with multiple docx documents, if the image name inside is the same,like image1.png, it will be automatically replaced so that only 1 file remain. How to solve this problem?Thank you!
If you use windows, search for powershell and open the Powershell ISE app.
Copy and paste this code:
#tell our computer we trust our ability to download packages
Set-ExecutionPolicy RemoteSigned -scope CurrentUser
#download a single package that we trust
Invoke-Expression (New-Object System.Net.WebClient).DownloadString('https://get.scoop.sh')
#scoop screens packages for us, so the packages available on scoop are generally more trustworthy
#wget allows for downloading from the web
scoop install wget
#pandoc allows for converting between many types of document
scoop install pandoc
Run it.
Then you can make a new file (Crtl + N)
And run the code from this page…after feeding in your directory.
# set the working directory
cd 'C:\Users\myusername\myfolder\word-documents-for-converting'
# find all .docx files in current directory
Get-ChildItem . -Filter *.docx | Foreach-Object { pandoc --from docx --to markdown --wrap=none $_ -o $_.Name.Replace('.docx', '.md')}