Publish: let the user provide a custom robot.txt (Block crawling, LLM Content protection)

For Obsidian Publish, it would be nice to configure a robot.txt file to gain additional protection against LLM crawling the web.

Use case or problem

When the site is public, there is a risk of LLM training on / stealing proprietary content without mentioning sources.

Proposed solution

Add an option in settings to create a robot.txt file to protect content from LLM.

User-agent: CCBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

Current workaround (optional)

Add a password

Related feature requests (optional)

3 Likes

Edited and expanded the scope.