Obsidian Publish provide Static Content: Enable Wayback Machine / LLMs to crawl Publish websites without running javascript

Use case or problem

It appears the Internet Archive cannot make snapshots of Obsidian Publish sites. Only 404 pages appear in the Wayback Machine.

I have confirmed that my Publish site allows web crawlers and is discoverable. The issue appears to be unique to the Internet Archive because alternative archive sites (like archive.today) can create snapshots without issue.

Proposed solution

Make it possible for the Internet Archive to properly crawl and create snapshots of Publish pages.

Current workaround (optional)

I have not been able to come up with a workaround.

Related feature requests (optional)

n/a

2 Likes

Just to clarify, we do not block Internet Archive. We think that their crawler isn’t not able to run and then save Obsidian Publish pages (which are not static HTML).

From Claude AI:

This is a fundamental architectural problem. Obsidian Publish is a JavaScript single-page application (SPA) where content is rendered client-side. When the Wayback Machine’s crawler fetches a page, it captures the initial HTML shell before JavaScript executes, which is essentially an empty container—hence the 404 you see. The Wayback Machine has historically struggled with JS-heavy sites because it archives the raw HTTP response rather than the rendered DOM.

Obsidian could potentially add server-side rendering or a static HTML fallback for crawlers.

Use case or problem

I use Obsidian Publish as a public, canonical knowledge base. Increasingly, I also use web-based Large Language Model (LLM) systems (e.g. ChatGPT and similar tools) as readers and research assistants over public web content.

Currently, Obsidian Publish pages are not reliably readable by non-JavaScript clients. Requests to canonical Publish URLs always return a JavaScript SPA shell, and the actual page content is fetched client-side via JavaScript from an internal endpoint. As a result:

  • Non-JS clients cannot reliably read Publish page content
  • Page existence cannot be determined via HTTP semantics
  • LLM systems cannot consume Publish pages directly via their public URLs

This prevents Obsidian Publish sites from being used as portable, machine-readable knowledge sources in modern AI-assisted workflows.

Proposed solution

Provide a supported, non-JavaScript way to retrieve published Obsidian content.

This could be implemented in several acceptable ways (implementation details are flexible):

  • Server-rendered HTML at canonical Publish URLs (with correct HTTP status codes), or
  • A documented, stable endpoint for raw or rendered page content (e.g. markdown or HTML), or
  • A query parameter or alternate route that serves non-JS content for automated readers

The key requirement is that published content be retrievable without executing JavaScript, while preserving the existing SPA experience for browsers.

Current workaround (optional)

The only reliable workaround today is to manually upload markdown files to LLM systems.

This is not ideal because:

  • It must be repeated every time the content is needed
  • It is not portable across LLM providers
  • It undermines the purpose of Obsidian Publish as a canonical, shareable source of truth
1 Like

I would love for Publish to produce pages that at least function in a basic way when JavaScript isn’t available. Currently if JS doesn’t load, you get nothing — just a blank page. The graph, I understand. But refusing to show text and links? People deserve better.

Related:

1 Like