It appears the Internet Archive cannot make snapshots of Obsidian Publish sites. Only 404 pages appear in the Wayback Machine.
I have confirmed that my Publish site allows web crawlers and is discoverable. The issue appears to be unique to the Internet Archive because alternative archive sites (like archive.today) can create snapshots without issue.
Proposed solution
Make it possible for the Internet Archive to properly crawl and create snapshots of Publish pages.
Current workaround (optional)
I have not been able to come up with a workaround.
Just to clarify, we do not block Internet Archive. We think that their crawler isn’t able to run and then save Obsidian Publish pages (which are not static HTML).
This is a fundamental architectural problem. Obsidian Publish is a JavaScript single-page application (SPA) where content is rendered client-side. When the Wayback Machine’s crawler fetches a page, it captures the initial HTML shell before JavaScript executes, which is essentially an empty container—hence the 404 you see. The Wayback Machine has historically struggled with JS-heavy sites because it archives the raw HTTP response rather than the rendered DOM.
Obsidian could potentially add server-side rendering or a static HTML fallback for crawlers.
I use Obsidian Publish as a public, canonical knowledge base. Increasingly, I also use web-based Large Language Model (LLM) systems (e.g. ChatGPT and similar tools) as readers and research assistants over public web content.
Currently, Obsidian Publish pages are not reliably readable by non-JavaScript clients. Requests to canonical Publish URLs always return a JavaScript SPA shell, and the actual page content is fetched client-side via JavaScript from an internal endpoint. As a result:
I would love for Publish to produce pages that at least function in a basic way when JavaScript isn’t available. Currently if JS doesn’t load, you get nothing — just a blank page. The graph, I understand. But refusing to show text and links? People deserve better.
Currently, when you click “View source” on an Obsidian Publish website, you get JavaScript instead of content. Bots can’t see such text, so the content of the website is not included in training of AI models.
Proposed solution
At least allow the users to switch to displaying the text in the HTML so that bots can read the content.
This is my own personal bugbear, but a docs site really shouldn’t be reliant on JS to render - it’s just a bunch of text. Rule of Least Power and all that.
But if you don’t agree from a code elegance point of view then perhaps the docs being invisible to LLM Agents might be a reason? Just encountered this, when Claude Code tried to use the docs but got knocked back, reporting: The docs site is JS-rendered so WebFetch can’t read it.
For permanent content (e.g. documentation), I really want to end up with static web pages. I’ve briefly tried Obsidian Publish, but I have a few issues with it: 1) it’s noticeably slower than a static site, 2) it relies on a specific server “somewhere”, and 3) it can’t be saved for long-term use (e.g. Internet Archive).
Proposed solution
This is not about trying to avoid any fees. On the contrary, I’ll gladly pay a fair amount for a mechanism which gives me a set of static files to replicate anywhere (again: for long-term permanence, i.e. years / decades). My proposal would be to provide an export feature for Obsidian Publish - go ahead, ask a hefty sum for it. Just give me the option to choose this route, don’t force me to buy into dynamically-served web pages for something which is a fantastic tool, even for things which are meant to only change occasionally.
Current workaround (optional)
Generating static content can be done with third-party tools, but they are cumbersome, have no clear long-term support commitment, and don’t work as seamlessly as Obsidian Publish.