Automating text clean up & structure (sloppy web clips & messy notes)?

street3r · February 1, 2025, 3:56pm

I’m guessing there’s an existing plug-in that does this (probably with AI…) but for the life of me I can’t find something that specifically addresses this use case. Please correct me if I’m wrong!

I have a mountain of messy text that I brought over to Obsidian. Mostly sloppy web clips from other tools/platforms that include SEO garbage, header/footer text, nav text, garbage image links and tags, etc. I’ve been trying to work through the problem manually for a long time, but at this point I would need to take a leave of absence from work, send my kid to live with a relative, and take out a loan to survive long enough to get through it all

SO: I’m looking for a plugin (AI or not) that can reformat these inconsistent and sloppy clips into a consistent format, strip out the obvious garbage (in the same way a browser does with Reader Mode), and ideally extract and standardize some of the metadata (origin, link, author, date) that exists in most of these files (albeit inconsistently placed and rendered). Bonus points for AI tools that can suggest tags based on content etc.

I’m less concerned than many Obsidian forum users about AI privacy considerations, since this is primarily content clipped from the open web, but I’m not so oblivious that I want to let 100 different AI plugins loose on my vault while I look for something that can do this.

Has anyone found a method for automating the clean up of messy web clips?

Jopp · February 1, 2025, 4:26pm

Since you didn’t mention what documents you want to tweak, I assume all your documents have been converted to markdown.

I myself don’t use Ai plugins, nevertheless have a look at a bunch of useful tools for this task:

Linter (please configure its preferences)
auto note mover
file diff
note refactor
quickadd
time keep

street3r · February 1, 2025, 10:01pm

Thank you so much, I will look into these! And you are correct that all files are plaintext/markdown.