Annotation feature in Obsidian base is not pulling annotations in all cases

Steps to reproduce

Following all steps assessed as well as the official request from the primary maintainer of PDF++, this bug report is intended to show that not all PDFs in the upstream annotator match the call comment that Obsidian uses in its default annotator. As a result all plugins for PDF functionality inherent an issue were some OCRs for PDFs fail to be captured by the extractor

(Note: take the above description with a grain of salt. I was asked by the developer to submit this and wrote this as well as I could given I’m not a developer))

Link: [Feature] Better support for annotated text extraction · Issue #224 · RyotaUshio/obsidian-pdf-plus · GitHub

Did you follow the troubleshooting guide?

Yes.

Did you try the above steps in the sandbox vault?

This was tried and proven to have occurred using the default sandbox as shown here:
Link: Not sure what is different, or if it's Adobe, but something weird is happening... · RyotaUshio/obsidian-pdf-plus · Discussion #223 · GitHub

Expected result

As stated the issue only occurs on some PDFs and as I’m submitting this ticket I’m still honestly not sure how this works. It may be worth reaching out to @ush directly. That said, My personal contribution which brought this to his attention is linked here, and the expected result is as follows in the image:

Link: Not sure what is different, or if it's Adobe, but something weird is happening... · RyotaUshio/obsidian-pdf-plus · Discussion #223 · GitHub

Actual result

As listed further down in my response you can see that the 2023 PDF (the one that does not work, returned and extracted a null highlight. I cannot figure out what defines a PDF that works vs. one that doesn’t for either the comment pulled, nor the highlight itself.

My post: Not sure what is different, or if it's Adobe, but something weird is happening... · RyotaUshio/obsidian-pdf-plus · Discussion #223 · GitHub

Environment

SYSTEM INFO:
Obsidian version: v1.6.3
Installer version: v1.5.12
Operating system: Windows 10 Pro 10.0.22631
Login status: logged in
Catalyst license: none
Insider build toggle: off
Live preview: on
Base theme: dark
Community theme: Vauxhall v1.0.1
Snippets enabled: 1
Restricted mode: off
Plugins installed: 12
Plugins enabled: 8
1: Auto Link Title v1.5.4
2: TagFolder v0.18.7
3: ePub Reader v1.0.2
4: Tag Wrangler v0.6.1
5: Media Extended v3.1.0
6: PDF++ v0.39.23
7: Dataview v0.5.66
8: Text Extractor v0.5.2

Additional information

For additional information (because I am definitely not the one to talk to about this), please contact @ush.

(Well, I didn’t really intend to “ask” her to file a bug report. My intention was to suggest sending a feature request or bug report as one of possible options. My English might have been ambiguous in the linked GitHub Discussion thread. I’m sorry for the confusion.)

Since the original report does not fully follow the troubleshooting guide and the template, let me re-report this problem.


Once you’ve done the above, delete everything above this line.

In some PDFs, the extraction of annotated text fails, leading to the two problems described below. (As I said in the linked GitHub Discussion thread, I’m not sure if this is considered as a bug although the current behavior is pretty counterintuitive.)

Steps to reproduce

sample_pdf.zip (123.7 KB)

Open the sandbox vault, then download the attached .zip and unzip it in the vault.

Problem 1

  1. Open not_working.pdf, click the highlight annotation “Hi, welcome to Obsidian!”
  2. In the annotation popup, click the “Copy” button. Then paste it to a note.

Problem 2

Right-click on the aforementioned highlight and see what menu items are displayed.

Did you follow the troubleshooting guide? [Y/N]

Y

Expected result

Problem 1

As in the case of working.pdf, not only the link but also the annotated text should be copied, like so

> Hi, welcome to Obsidian!

[[not_working.pdf#page=1&annotation=28R]]

Problem 2

As in the case of working.pdf, both “Copy link to annotation” and “Copy annotation” should be shown.

Actual result

Problem 1

The annotated text is not extracted, and only the link is copied.

[[not_working.pdf#page=1&annotation=28R]]

Problem 2

Only “Copy link to annotation” is shown, and “Copy annotation” is not shown.

Environment

SYSTEM INFO:
Obsidian version: v1.6.3
Installer version: v1.6.3
Operating system: Darwin Kernel Version 22.6.0: Mon Feb 19 19:43:41 PST 2024; root:xnu-8796.141.3.704.6~1/RELEASE_ARM64_T8103 22.6.0
Login status: logged in
Catalyst license: insider
Insider build toggle: off
Live preview: on
Base theme: adapt to system
Community theme: none
Snippets enabled: 0
Restricted mode: off
Plugins installed: 0
Plugins enabled: 0

RECOMMENDATIONS:
none


Additional information

  • The technical detail is described here: [Feature] Better support for annotated text extraction · Issue #224 · RyotaUshio/obsidian-pdf-plus · GitHub
    In short, this problem is caused by how the PDFViewerChild.prototype.getTextByRect method is implemented.
    Here, PDFViewerChild means the class of view.viewer.child, where view is a PDF view. (I don’t know the true class name but I named it for my plugin’s typing purpose)
  • It might be worth mentioning that Adobe Acrobat can successfully extract the annotated text in the attached sample PDF (right-click on the highlight > “Copy text”).
1 Like

Thanks, we will double check this.

2 Likes

Hey @whitenoise!

I am terribly sorry that I haven’t popped here in months, but I ended up working with @ush on this issue with PDF++ and he ended up solving the issue by proxy which is FANTASTIC (If you came across this issue here I highly recommend giving Ryota a quick 5 buck coffee as he worked SUPER HARD on this!)

That said to learn more about this issue here are the GitHub issues that we worked on this for for months - it’s solved now!


yuppers - issue resolved.