AI chatbots like Google's Bard and OpenAI's wildly popular ChatGPT are trained using massive amount of data scraped from the internet. As expected, every AI lab is racing to make their chatbots smarter using all the data that they can get their hands on — a tactic that has proven quite controversial because they don't often pay the creator or owner of the scraped content. For example, OpenAI recently launched GPTBot, a web crawler that sifts through information it comes across on the internet.
NOW
PLAYING
"Web pages crawled with the GPTBot user agent may potentially be used to improve future models," OpenAI explains on its website. The Microsoft-backed company, however, clarifies that the AI bot won't scrape content from websites that serve paywalled content. Moreover, it won't glean any content that is deemed personally identifiable, or contains content that violates its own safety guidelines and policies.
Now, GPTBot is not the only internet crawler out there. Stable Diffusion and LAION use Common Crawl, a non-profit which owns petabytes worth of internet data, dating back to 2008. If you seek to disable GPTBot, you might as well take the initiative to block the Common Crawl's CCBot web scraper. Just for the sake of information, Google also used the Common Crawl dataset to train Bard, its very own ChatGPT rival.
How to disable OpenAI's GPTBot?
Preventing OpenAI's web crawler from accessing the contents of a website is a fairly simple process. All it needs is a slight...
Read Full Story: https://news.google.com/rss/articles/CBMiRmh0dHBzOi8vd3d3LnNsYXNoZ2Vhci5jb20vMTM2MzkxNy93aGF0LWlzLW9wZW5haS1ncHRib3QtYmxvY2stY2hhdGdwdC_SAQA?oc=5
Your content is great. However, if any of the content contained herein violates any rights of yours, including those of copyright, please contact us immediately by e-mail at media[@]kissrpr.com.