Publish: Serve Static Content - Enable Wayback Machine / LLMs to crawl Publish websites without running javascript

Use case or problem

It appears the Internet Archive cannot make snapshots of Obsidian Publish sites. Only 404 pages appear in the Wayback Machine.

I have confirmed that my Publish site allows web crawlers and is discoverable. The issue appears to be unique to the Internet Archive because alternative archive sites (like archive.today) can create snapshots without issue.

Proposed solution

Make it possible for the Internet Archive to properly crawl and create snapshots of Publish pages.

Current workaround (optional)

I have not been able to come up with a workaround.

Related feature requests (optional)

n/a

3 Likes

Just to clarify, we do not block Internet Archive. We think that their crawler isn’t able to run and then save Obsidian Publish pages (which are not static HTML).

From Claude AI:

This is a fundamental architectural problem. Obsidian Publish is a JavaScript single-page application (SPA) where content is rendered client-side. When the Wayback Machine’s crawler fetches a page, it captures the initial HTML shell before JavaScript executes, which is essentially an empty container—hence the 404 you see. The Wayback Machine has historically struggled with JS-heavy sites because it archives the raw HTTP response rather than the rendered DOM.

Obsidian could potentially add server-side rendering or a static HTML fallback for crawlers.

1 Like

Use case or problem

I use Obsidian Publish as a public, canonical knowledge base. Increasingly, I also use web-based Large Language Model (LLM) systems (e.g. ChatGPT and similar tools) as readers and research assistants over public web content.

Currently, Obsidian Publish pages are not reliably readable by non-JavaScript clients. Requests to canonical Publish URLs always return a JavaScript SPA shell, and the actual page content is fetched client-side via JavaScript from an internal endpoint. As a result:

  • Non-JS clients cannot reliably read Publish page content
  • Page existence cannot be determined via HTTP semantics
  • LLM systems cannot consume Publish pages directly via their public URLs

This prevents Obsidian Publish sites from being used as portable, machine-readable knowledge sources in modern AI-assisted workflows.

Proposed solution

Provide a supported, non-JavaScript way to retrieve published Obsidian content.

This could be implemented in several acceptable ways (implementation details are flexible):

  • Server-rendered HTML at canonical Publish URLs (with correct HTTP status codes), or
  • A documented, stable endpoint for raw or rendered page content (e.g. markdown or HTML), or
  • A query parameter or alternate route that serves non-JS content for automated readers

The key requirement is that published content be retrievable without executing JavaScript, while preserving the existing SPA experience for browsers.

Current workaround (optional)

The only reliable workaround today is to manually upload markdown files to LLM systems.

This is not ideal because:

  • It must be repeated every time the content is needed
  • It is not portable across LLM providers
  • It undermines the purpose of Obsidian Publish as a canonical, shareable source of truth
1 Like

I would love for Publish to produce pages that at least function in a basic way when JavaScript isn’t available. Currently if JS doesn’t load, you get nothing — just a blank page. The graph, I understand. But refusing to show text and links? People deserve better.

Related:

1 Like

Use case or problem

Currently, when you click “View source” on an Obsidian Publish website, you get JavaScript instead of content. Bots can’t see such text, so the content of the website is not included in training of AI models.

Proposed solution

At least allow the users to switch to displaying the text in the HTML so that bots can read the content.

Current workaround (optional)

Related feature requests (optional)

This is my own personal bugbear, but a docs site really shouldn’t be reliant on JS to render - it’s just a bunch of text. Rule of Least Power and all that.

But if you don’t agree from a code elegance point of view then perhaps the docs being invisible to LLM Agents might be a reason? Just encountered this, when Claude Code tried to use the docs but got knocked back, reporting: The docs site is JS-rendered so WebFetch can’t read it.

Use case or problem

For permanent content (e.g. documentation), I really want to end up with static web pages. I’ve briefly tried Obsidian Publish, but I have a few issues with it: 1) it’s noticeably slower than a static site, 2) it relies on a specific server “somewhere”, and 3) it can’t be saved for long-term use (e.g. Internet Archive).

Proposed solution

This is not about trying to avoid any fees. On the contrary, I’ll gladly pay a fair amount for a mechanism which gives me a set of static files to replicate anywhere (again: for long-term permanence, i.e. years / decades). My proposal would be to provide an export feature for Obsidian Publish - go ahead, ask a hefty sum for it. Just give me the option to choose this route, don’t force me to buy into dynamically-served web pages for something which is a fantastic tool, even for things which are meant to only change occasionally.

Current workaround (optional)

Generating static content can be done with third-party tools, but they are cumbersome, have no clear long-term support commitment, and don’t work as seamlessly as Obsidian Publish.

Related feature requests (optional)