Steps to reproduce
I am writing to report a persistent issue with the Obsidian web clipper, specifically concerning the incorrect placement of images when clipping content from dynamic web pages, especially those with extensive JavaScript, such as X (formerly Twitter). Images often lose their original inline positions within the text and are moved to the end of the clipped content.
I have conducted a thorough analysis of this problem, including testing various web clipping tools and developing custom solutions using headless browser technology (Puppeteer). My findings indicate that accurately preserving image placement from highly dynamic and JavaScript-intensive websites is a significant technical challenge for general-purpose web clippers. This is primarily due to:
Dynamic Content Loading: Content is loaded via JavaScript after the initial page load, making traditional HTML parsing insufficient.
Complex DOM Structure: The intricate and nested DOM structures of these sites make it difficult to precisely target and extract content while maintaining the correct visual relationships between text and images.
Image and Text Relationship: The inline positioning of images is often dynamically determined by CSS and JavaScript, which is challenging to replicate in a static Markdown format.
Anti-Bot Mechanisms: Some websites employ measures that can hinder automated clipping processes.
I have attached a detailed report outlining my analysis, the approaches I tested, and the technical challenges encountered. This report also includes alternative suggestions for users facing similar issues.
I believe addressing this issue would significantly enhance the web clipping experience for users who frequently clip content from dynamic web sources. Your consideration of this technical feedback for future developments would be greatly appreciated.
Thank you for your time and attention to this matter.
Sincerely,
Did you follow the troubleshooting guide? [Y/N]
noExpected result
find a solution to misplaced photosActual result
photos are below the text and all misplacedreport and html I have problem zip.zip (411.7 KB)