Obsidan WebClipper no longer recognizes headings (h1, h2, …, h6)

Suddenly, the WebClipper bookmarklet no longer recognizes headings—by which I mean it leaves them out completely.

What I’m trying to do

Clip text from webpages that includes headings, with the headings intact.

Things I have tried

Restarting, searching Google.

It could depend on the website you’re trying to clip :blush:

I encountered that issue specifically on Wikipedia and after some investigation from Kepano, it could be due to a bug in Readability (the library used to parse the content from a website)…

I was kindly given the appropriate {{selectorHTML}} working for Wikipedia (strictly) to bypass that issue …

1 Like

Amazing! Yes, Wikipedia was the first to make me realize that all headings were deleted. But after your reply here I tried it on others and they are all missing headings except for plato.stanford.edu.

Could you share the selector you used and how to implement your solution? I just read this so I guess I have some idea of what you did.

Thanks. I’m glad I’m not the only one and that there’s a solution.

Oh, and I see that the topic is discussed here in more detail. I just need to find out how the modify the (already-existing?) template (singled out for Wikipedia use?).

I can at least share the selector I was given (and works perfectly) for Wikipedia (only) :blush: :

{{selectorHtml:#mw-content-text|remove_html:(".navbox,.printfooter,.side-box")|markdown}}

… which for a dedicated “wikipedia” template, I use instead of using {{content}} within the Note content section of the template :blush:

1 Like

UPDATE: I’m inside the wikipedia-clipper.json template and I think I see the important line:

	"noteContentFormat": "{{selectorHtml:#mw-content-text|remove_html:(\".navbox,.printfooter,.side-box\")|markdown}}",

Ok, I see your response now. Thanks!

So I go to Safari Preferences > Extensions > Obsidian Web Clipper and make a new template using

https:\\/\\/[a-z]{2,3}\\.wikipedia\\.org\\/wiki\\/?$/

as the trigger and replace the

{{Content}}

with your string?

I’ll try now …

Good grief. I think it worked. This is the first time I’ve ever solved a problem on a forum in real time. Thanks brah!

2 Likes

Except the damn regex trigger isn’t working. The template that gets used is the one on the top of the list, so I have to set the template manually each time. I’ll try again later. Thanks again.

This seems to work:

/^https:\/\/[a-z]{2,3}\.wikipedia\.org\/wiki\/[^$]*$/
1 Like

Damn it man! That’s the one! I thought I tried every revision of the one they show inside the template on GitHub, but I had the ending wrong! The one I used was this:

https:\/\/[a-z]{2,3}\.wikipedia\.org\/wiki\/?$/

Thanks a million!

1 Like

Helpful thread. Can you guys share a little bit more about the principles of what you did? I’m new to all of this, and running into the same problem but for a different website (not wikipedia), and the web clipper is omitting H2, H3, etc.

If I’m reading your troubleshooting correctly, you need to get the *.json template of the particular website, then inspect it?

Can you also point me to resources on how to go about explaining what this means:

https://[a-z]{2,3}.wikipedia.org/wiki/?$/

so that I can customize a selector for any other sites that encounter this problem?

Appreciate the help