Disallow LLM (AI/ML) Scraping Using Google-Extended on Obsidian Publish

acook · January 10, 2024, 2:03pm

Google has admitted to training their AI models on publicly accessible websites.

They have provided an opt-out for it.

Use the Google-Extended product token in the site’s robots.txt to disallow scraping Publish sites for training data.

I think this should be the default behavior, people can opt-in if it suits them down the line.

I recognize that they may not even respect it, but I would rather not even imply that permission was ever given.

No known workaround exists other than blocking all web crawlers or putting everything behind a password.

WhiteNoise · January 10, 2024, 2:45pm

I think this is too similar to the related FR you linked. I am going to close this and continue the conversation there.

system · April 9, 2024, 2:45pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.