Obsidan WebClipper no longer recognizes headings (h1, h2, …, h6)

DamienKarras · November 6, 2024, 8:35am

Suddenly, the WebClipper bookmarklet no longer recognizes headings—by which I mean it leaves them out completely.

What I’m trying to do

Clip text from webpages that includes headings, with the headings intact.

Things I have tried

Restarting, searching Google.

Pch · November 6, 2024, 9:28am

It could depend on the website you’re trying to clip …

I encountered that issue specifically on Wikipedia and after some investigation from Kepano, it could be due to a bug in Readability (the library used to parse the content from a website)…

I was kindly given the appropriate {{selectorHTML}} working for Wikipedia (strictly) to bypass that issue …

DamienKarras · November 6, 2024, 12:31pm

Amazing! Yes, Wikipedia was the first to make me realize that all headings were deleted. But after your reply here I tried it on others and they are all missing headings except for plato.stanford.edu.

Could you share the selector you used and how to implement your solution? I just read this so I guess I have some idea of what you did.

Thanks. I’m glad I’m not the only one and that there’s a solution.

Oh, and I see that the topic is discussed here in more detail. I just need to find out how the modify the (already-existing?) template (singled out for Wikipedia use?).

Pch · November 6, 2024, 12:40pm

I can at least share the selector I was given (and works perfectly) for Wikipedia (only) :

{{selectorHtml:#mw-content-text|remove_html:(".navbox,.printfooter,.side-box")|markdown}}

… which for a dedicated “wikipedia” template, I use instead of using {{content}} within the Note content section of the template

DamienKarras · November 6, 2024, 12:47pm

UPDATE: I’m inside the wikipedia-clipper.json template and I think I see the important line:

	"noteContentFormat": "{{selectorHtml:#mw-content-text|remove_html:(\".navbox,.printfooter,.side-box\")|markdown}}",

Ok, I see your response now. Thanks!

So I go to Safari Preferences > Extensions > Obsidian Web Clipper and make a new template using

https:\\/\\/[a-z]{2,3}\\.wikipedia\\.org\\/wiki\\/?$/

as the trigger and replace the

{{Content}}

with your string?

I’ll try now …

DamienKarras · November 6, 2024, 12:49pm

Good grief. I think it worked. This is the first time I’ve ever solved a problem on a forum in real time. Thanks brah!

DamienKarras · November 6, 2024, 1:02pm

Except the damn regex trigger isn’t working. The template that gets used is the one on the top of the list, so I have to set the template manually each time. I’ll try again later. Thanks again.

anon45210282 · November 6, 2024, 2:54pm

This seems to work:

/^https:\/\/[a-z]{2,3}\.wikipedia\.org\/wiki\/[^$]*$/

DamienKarras · November 6, 2024, 10:28pm

Damn it man! That’s the one! I thought I tried every revision of the one they show inside the template on GitHub, but I had the ending wrong! The one I used was this:

https:\/\/[a-z]{2,3}\.wikipedia\.org\/wiki\/?$/

Thanks a million!

k3russ0 · November 15, 2024, 12:07am

Helpful thread. Can you guys share a little bit more about the principles of what you did? I’m new to all of this, and running into the same problem but for a different website (not wikipedia), and the web clipper is omitting H2, H3, etc.

If I’m reading your troubleshooting correctly, you need to get the *.json template of the particular website, then inspect it?

Can you also point me to resources on how to go about explaining what this means:

https://[a-z]{2,3}.wikipedia.org/wiki/?$/

so that I can customize a selector for any other sites that encounter this problem?

Appreciate the help

system · December 13, 2024, 12:08am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.