Use case or problem
Google has admitted to training their AI models on publicly accessible websites.
They have provided an opt-out for it.
Proposed solution
Use the Google-Extended
product token in the site’s robots.txt
to disallow scraping Publish sites for training data.
I think this should be the default behavior, people can opt-in if it suits them down the line.
I recognize that they may not even respect it, but I would rather not even imply that permission was ever given.
Current workaround (optional)
No known workaround exists other than blocking all web crawlers or putting everything behind a password.