I’ve been copying important documents into my vault’s PDF folder, but there are many times when I wish I had the markdown text file equivalent of that PDF instead.
I often start with this site to convert PDF to MD, grab the results, and paste that text into a new note. The problem is there’s A LOT of cleanup needed. That’s driven an unhealthy obsession with regular expressions, but that’s another story…
Sometimes I try Pandoc (where I’m a complete noob) and I get decent output, especially if I use Acrobat to wash the PDF into DOCX, then Pandoc to go from DOCX to markdown.
That seemed to work well, except tables come out horribly. I get lines with plus signs all over them instead of standard Obsidian markdown table syntax. In some cases I’m going to be responsible for editing/re-creating these documents, so it would make sense to create their markdown file equivalent.
- What do you do with PDFs? Keep them external and import the notes only?
- Run them through a customized PDF-to-markdown filter?
- Some other system?