Searchable OCR - Let's get it built!

I’d say there are two parts to this request:

  1. Extracting data in image-based PDFs: I.e. doing OCR on those pdfs.
  2. Including PDF-data in global search

For point 1 I don’t think any new files would have to be created, the pdf itself could just be updated to include the data gathered through OCR. This is also covered in the OCR-thread you linked.

For point 2 there is already a separate request. How to best solve this I don’t know, but I agree with you - there shouldn’t be plenty of new files created.