Google has admitted to training their AI models on publicly accessible websites.
They have provided an opt-out for it.
Google-Extended product token in the site’s
robots.txt to disallow scraping Publish sites for training data.
I think this should be the default behavior, people can opt-in if it suits them down the line.
I recognize that they may not even respect it, but I would rather not even imply that permission was ever given.
No known workaround exists other than blocking all web crawlers or putting everything behind a password.
- Publish: let the user provide a custom robot.txt (Block crawling, LLM Content protection) - #3 by BearDroid77