Question about AI plugins for Obsidian: Layout-aware table extraction from multi-section newspaper PDFs—any Obsidian plugin support?

I have a few thousand newspaper issues—spanning over three decades—from which I need to extract a multi-page table. Each issue contains around 2,000–2,500 ship names, presented in a six-column table that flows across three horizontal sections per page. Each section holds a continuation of the same table, and the layout continues fluidly across multiple pages.

Is there any plugin in Obsidian (or perhaps Zotero ) that can extract these tables and save each section as a Markdown table—or ideally, merge all six to eight pages into a single MD table with wiki-links?

I’m about to build a local Python script using AI-assisted text recognition (OCR)—possibly with Tesseract and OpenCV—and a local LLM for automated text learning. But before I start, I wanted to check here if any existing tools already support this kind of layout-aware extraction locally. Each file is over 2GB, so uploading to services like Transkribus is not an option.

I plan to use the extracted data in a combined genealogy and Norwegian mercantile ship research project I tinker with in my spare time—using Obsidian, Foam for VSC, Aoen Timeline, Gramps, and Tulip/Cytoscape/Gephi.


Note: This text was translated and corrected by Copilot for flow and readability, from Norwegian to English.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.