It could depend on the website you’re trying to clip …
I encountered that issue specifically on Wikipedia and after some investigation from Kepano, it could be due to a bug in Readability (the library used to parse the content from a website)…
I was kindly given the appropriate {{selectorHTML}} working for Wikipedia (strictly) to bypass that issue …
Amazing! Yes, Wikipedia was the first to make me realize that all headings were deleted. But after your reply here I tried it on others and they are all missing headings except for plato.stanford.edu.
Could you share the selector you used and how to implement your solution? I just read this so I guess I have some idea of what you did.
Thanks. I’m glad I’m not the only one and that there’s a solution.
Oh, and I see that the topic is discussed here in more detail. I just need to find out how the modify the (already-existing?) template (singled out for Wikipedia use?).
Except the damn regex trigger isn’t working. The template that gets used is the one on the top of the list, so I have to set the template manually each time. I’ll try again later. Thanks again.
Damn it man! That’s the one! I thought I tried every revision of the one they show inside the template on GitHub, but I had the ending wrong! The one I used was this:
Helpful thread. Can you guys share a little bit more about the principles of what you did? I’m new to all of this, and running into the same problem but for a different website (not wikipedia), and the web clipper is omitting H2, H3, etc.
If I’m reading your troubleshooting correctly, you need to get the *.json template of the particular website, then inspect it?
Can you also point me to resources on how to go about explaining what this means:
https://[a-z]{2,3}.wikipedia.org/wiki/?$/
so that I can customize a selector for any other sites that encounter this problem?