Searchable OCR - Let's get it built!

Gnopps · December 21, 2021, 8:59am

I’d say there are two parts to this request:

Extracting data in image-based PDFs: I.e. doing OCR on those pdfs.
Including PDF-data in global search

For point 1 I don’t think any new files would have to be created, the pdf itself could just be updated to include the data gathered through OCR. This is also covered in the OCR-thread you linked.

For point 2 there is already a separate request. How to best solve this I don’t know, but I agree with you - there shouldn’t be plenty of new files created.