Would it be possible to search content inside of PDFs? Similar to how it functions in Evernote or OneNote.
Also it would be really cool if navigation can be improved in one way or another
- PgUp and PgDn or arrow keys for moving between pages
- Zoom in and out
- Ability to edit the page no in the indicator to go from pg to another
Agreed. I would love for PDFs to be part of search, or added as a kind of alternative search, like âsearch in: PDFsâ.
I agree with @nicrivard and @divesh_code, these features would be very useful.
Not only would this be helpful in general, as some people have valuable/insightful information stored in PDFs that theyâd like searched as well as their standard vault, but it would help with âweb clippingâ; if one ran across a page with difficult-to-translate-to-markdown-formatting they could simply âprint as PDFâ and put the PDF in their vault.
The trouble is that this isnât a simple thing to do. Reading a PDF takes time, managing a very large number of them would require indexing, and if the text isnât already in extractable form, it will require OCR.
Docfetcher, a very useful open source text searcher (in maintenance mode last I heard owing to maintainerâs lack of time) has the following in its FAQ:
Why are the DocFetcher installer and the other packages so large (> 30 MB)?
This is mainly due to the fact that DocFetcher is shipped with lots of built-in text extraction libraries, some of which are quite big. The worst offenders are the libraries for MS Office and PDF files. However, the developers of these libraries arenât to blame here: The libraries have to be big because the respective file formats are immensely complex.
The whole point of plaintext and markdown is that it is quickly and easily read. PDFs are a whole extra world of processing requirements.
My suggestion for people who need to be able to do a native Obsidian search on this text would be to bulk extract the text outside of Obsidian and put it into separate files with links to the original PDF.
Thinking about it, maybe someone could produce a plugin that called docfetcher (or another program/libraries) and allowed the text to be searched and used in Obsidian. That would be a plugin rather than a feature though.
add-on = plugin
But no need for Obsidianâs developers to do it themselves.
And if you look at docfetcherâs filetypes they go much further than just PDF. Includes the MSOffice formats, Open Office, epub, RTF etc. All formats that some Obsidian users would like.
My main points are:
- Itâs a natural plugin and doesnât need the devs to do it themselves.
- The plugin should cover all frequently used types of text document. There are many threads for Office etc support.
- Many users will accumulate a very large number of files, and would gain flexibility and speed from a plugin that maintained an index.
Indeed. As I said originally.
But itâs open source. It can be forked, it can be updated.
If someone can make it smaller or faster Iâm sure the original developer would be delighted.
Size her is about the range of formats it accepts and also the need to have an index database. A simple text scraper offers much less. Some users might prefer a simple grep plugin, but thereâs no reason there canât be both.
I can tell youâve never used docfetcher in practice. It works OK. It takes text from a number of files other, sometimes expensive, programs choke on. Iâve never encountered a PDF it wonât work on. I work a lot with PDFs and have most of the common commercial editors which will open and save in all up-to-date formats, and docfetcher has never had a problem with any of them. iirc the most recent PDF standard was 2017 (and the vast majority of PDFs adhere to earlier standards).
Docfetcher was only an example of the type of functionality needed and which could make an easy route to a plugin, given itâs open source. Itâs cross-platform text search software but not management. Fairly simple but effective and free.
Itâs text search that makes a natural plugin.
Probably at least two in the end, one indexed and one not. An indexed version would allow a much more sophisticated set of features.
Iâd like to see this as well (searching available data in a PDF).
Does anyone know if this is likely to happen? Or just completely impossible? I really, really want to stop using evernote and move to obsidian, and this is the only thing stopping me at present
Yeah thatâs a pitty.
Since the internal pdf viewer of electron doesnât support proper PDF handling i decided to write my own plugin to be able to, show only one PDF page, cut out a picture out of a PDF document and show it inside a note. Itâs based on pdf.js. You can find the first prototype here.
With pdf.js im able to extract the whole text of a pdf. Is there a way to extend search results via plugin?
We used to use pdf.js and moved to the native renderer becuase itâs faster and more accurate (some pdf do not open correctly with pdf.js)
We might revisit our choice in the future, especially if we need to other things with pdf.
The feature of PDF viewer is great !, but a great viewer needs the search bar definitely
Iâve posted in the Electron issue here pushing for this feature and asking for some sort of detail to help the community get involved with this, if necessary. Support searching in native PDF rendering ¡ Issue #9030 ¡ electron/electron ¡ GitHub . Upvote if you agree with any notion and want more information from Deepak, an Electron maintainer who might be able to speak to what can be done (but may not have the time).
Noteworthy, for Obsidian developers, it does seem that Chromium (electronâs base) search for PDF uses the browser search, not the embedded PDF workflow. So is it possible that Chromium releases a search outside of the scope of the embedded PDF viewer, and it doesnât get pulled into the viewer for Obsidian?
For the time being, Iâve found the closest workaround for this issue at the moment is by just opening the PDF with the native app:
- Click title or outside of the PDF itself, since it eats your scope
- Ctrl + P
- Search âDefault appâ => Click âOpen in Default Appâ
Is there a current suggested way to work around this issue and allow the text of PDFâs to be searchable as part of a master vault search?
Do any of the PDF plugins allow âimportingâ or is there as suggested way that PANDOC or something else be used to create a MD file from a PDF?
ThanksâŚ
I am also quite affected by this. Evernote is pretty good searching in the full text, including PDFs. My main problem is that I often scan documents into PDF, which I would love to be able to search.
Does anyone have a workaround to search both .md
files and .pdf
? Particularly tricky in mobile (iOS). Iâm even considering implementing this functionality somehow, even if it is as a plugin, because it is the only thing that really stops be from loving my Obsidian experience.
Not sure if it would be possible to integrate something like pdfgrep (https://pdfgrep.org/) in the mobile version of Obsidian.
Alternatively, maybe a plugin could be made to maintain a OCR dump of every PDF file in the vault, that could be searched. The integration with search might be painful. Maybe itâd be possible to directly find out you are trying to open the text OCR version of the PDF and open the PDF instead (maybe even in the right spot).