Sites that don’t want their content used by Google to train its large language models for generative AI can now opt out by adding a new user agent, Google-Extended, to their robots.txt file.
By Jon Henshaw
09/29/2023 (Last updated on 09/29/2023)
Tech companies training large language models (LLMs) have taken the stance that you must opt-out if you don’t want your copyrighted content taken freely without your permission and used to generate new content they can sell for a profit.
OpenAI, the company behind ChatGPT, was one of the first companies to provide sites with the option to opt-out. And now Google has released a similar method.
On September 28, 2023, Danielle Romain, VP of Trust at Google, announced the creation of a new user agent (UA) that can be used in a site’s robots.txt file to request Google not use its site content for LLM training.
Unlike other Google UAs, like Googlebot and Google-Image, Google-Extended doesn’t crawl web pages. Instead, it tells Google, after it crawls your site, only to use your content to index and return your pages in its search results and not to train its LLMs. That means sites can block Google from using their content for generative AI without harming their SEO.
How to use the Google-Extended user agent
Google-Extended uses the same directives as other UAs in the robots.txt file. If you want to block Google from training any of your site content, you can add a directive to disallow all.
If you’re OK with Google training with some of...
Read Full Story: https://news.google.com/rss/articles/CBMiWmh0dHBzOi8vd3d3LmNveXdvbGYubmV3cy9zZW8vZ29vZ2xlLWFubm91bmNlcy1tZXRob2QtZm9yLXNpdGVzLXRvLW9wdC1vdXQtb2YtbGxtLXRyYWluaW5nL9IBAA?oc=5
Your content is great. However, if any of the content contained herein violates any rights of yours, including those of copyright, please contact us immediately by e-mail at media[@]kissrpr.com.