I’d say there are two parts to this request:
- Extracting data in image-based PDFs: I.e. doing OCR on those pdfs.
- Including PDF-data in global search
For point 1 I don’t think any new files would have to be created, the pdf itself could just be updated to include the data gathered through OCR. This is also covered in the OCR-thread you linked.
For point 2 there is already a separate request. How to best solve this I don’t know, but I agree with you - there shouldn’t be plenty of new files created.