Automatically retrieve highlights in PDF and link it with Obsidian note blocks

Before Obsidian, i used Docear to manage my annotations in PDF.
Docear can scan and retrieve highlights and annotation like zotfile. It also can remember the annotation position in the PDF file. However, it can retrieve annotation of multiple files at once, and do it faster than zotfile.

Currently, @argentum written mdnotes to bridge between obsidian and zotero. However, i think it would be more efficient to be able to retrieve annotation straight from PDF into obsidian. Each PDF will have their own md file. Each block linking with their respective PDF file. (basically zotfile for obsidian)

Docear is open source, and they uploaded all their code to their website.

Additional request: It would be even better if this plugin also link with zotero (as we still need zotero to cite in MS docx)

3 Likes

For the Keypoints app I’m currently developing (it’s not available yet), I’ve implemented something similar based on individual plaintext notes. Goals are to facilitate an academic reading workflow & knowledge management. See this thread for more info and a little screencast.

In the future, I hope to integrate this with knowledge management apps such as Obsidian. This would allow you to directly push your literature notes from Keypoints to Obsidian (ideally in a customizable template-based format).

3 Likes

I suggested something similar here: Markdown layer on top of pdf

But yes I agree, we need something to make the academic workflow more efficient

I use an app called Highlights that automatically saves your pdf highlights in markdown. It’s not ideal in several respects (used to be quite buggy, which is much better now, but it’s not very customizable), but works really well for this kind of thing. Skim, which is open source afaik, also has some useful scripts for exporting annotations into md.
That said, I’ll definitely try Keypoints once it’s out :).

1 Like

Hi there! I created a plugin in December that, like zotfile, takes an annotated PDF and extracts the text highlights into markdown.

Extracting PDF annotations is surprisingly, shockingly hard – much harder than I had expected. It took me weeks to get it working. Unfortunately, since Obsidian upgraded their internal PDF library a few weeks back, the plugin has to be completely re-written as well to match their version again.

Anyways, I don’t have time at the moment to do this full refactor, but the code and commits are there in case anyone wants to help out.

2 Likes

Good news! I fixed the issue for the above plugin. Give it a spin, the name is “PDF Highlights” in the community plugins.

3 Likes

Hey akaalias would it be possible to extract the highlights as atomic quotes and reference them in a parent page?
It would be amazing if the atomic quotes or blocks would then also have the metadata included.

Hey there!

I think I understand what you envision. But just to be sure: Can you walk me through how this would work step-by-step?

This is what I’m thinking of

  1. import a PDF to Obsidian
  2. automatically generate empty page for the PDF (metadata of PDF saved in this file)
  3. manually highlight text in pdf editor
  4. highlights automatically extracted as atomic blocks which have the same metadata plus page reference
  5. excerpts listed as block references under the generated “empty” page

Benefits:
I would have a page of highlights for every PDF
Metadata and would be given for every quote
Every quote could have extra comments in their metadata or on the summary page
I could link every block in the graph since they have their own page

Does that make sense?
I can imagine that this would be a lot of work