Question about AI plugins for Obsidian: Layout-aware table extraction from multi-section newspaper PDFs—any Obsidian plugin support?

Lost4EverLost · October 18, 2025, 3:31pm

I have a few thousand newspaper issues—spanning over three decades—from which I need to extract a multi-page table. Each issue contains around 2,000–2,500 ship names, presented in a six-column table that flows across three horizontal sections per page. Each section holds a continuation of the same table, and the layout continues fluidly across multiple pages.

Is there any plugin in Obsidian (or perhaps Zotero ) that can extract these tables and save each section as a Markdown table—or ideally, merge all six to eight pages into a single MD table with wiki-links?

I’m about to build a local Python script using AI-assisted text recognition (OCR)—possibly with Tesseract and OpenCV—and a local LLM for automated text learning. But before I start, I wanted to check here if any existing tools already support this kind of layout-aware extraction locally. Each file is over 2GB, so uploading to services like Transkribus is not an option.

I plan to use the extracted data in a combined genealogy and Norwegian mercantile ship research project I tinker with in my spare time—using Obsidian, Foam for VSC, Aoen Timeline, Gramps, and Tulip/Cytoscape/Gephi.

Note: This text was translated and corrected by Copilot for flow and readability, from Norwegian to English.