For Obsidian Publish, it would be nice to configure a robot.txt file to gain additional protection against LLM crawling the web.
Use case or problem
When the site is public, there is a risk of LLM training on / stealing proprietary content without mentioning sources.
Add an option in settings to create a robot.txt file to protect content from LLM.
Current workaround (optional)
Add a password
Related feature requests (optional)
Edited and expanded the scope.
Hello! With OpenAI’s announcement that they will now allow us to disallow their crawler from ingesting our websites, has there been any progress on allowing us to utilize this block on our Obsidian Publish websites? Thank you!
Thank you for filing this feature request. I second this. If supporting robots.txt editing is not possible then at least a switch in the preferences for this.
Just chiming in to support this one: I realize that scrapers can choose to ignore robots.txt (and the disreputable ones absolutely will) but I also want to opt my content out of OpenAI, Microsoft and related scraping.