How to Block ChatGPT From Using Your Website Content - Search Engine Journal

Last updated Saturday, February 11, 2023 09:05 ET , Source: NewsService

Replace (not provided) with ALL of your organic keywords inside of Adobe & Google Analytics. Analyze performance by 400+ dimensions and metrics.

TRY FOR FREE

There is concern about the lack of an easy way to opt out of having one’s content used to train large language models (LLMs) like ChatGPT. There is a way to do it, but it’s neither straightforward nor guaranteed to work.

How AIs Learn From Your Content

Large Language Models (LLMs) are trained on data that originates from multiple sources. Many of these datasets are open source and are freely used for training AIs.

In general, Large Language Models use a wide variety of sources to train from.

Examples of the kinds of sources used:

Wikipedia
Government court records
Books
Emails
Crawled websites

There are actually portals and websites offering datasets that are giving away vast amounts of information.

One of the portals is hosted by Amazon, offering thousands of datasets at the Registry of Open Data on AWS.

The Amazon portal with thousands of datasets is just one portal out of many others that contain more datasets.

Wikipedia lists 28 portals for downloading datasets, including the Google Dataset and the Hugging Face portals for finding thousands of datasets.

Datasets Used to Train ChatGPT

ChatGPT is based on GPT-3.5, also known as InstructGPT.

The datasets used to train GPT-3.5 are the same used for GPT-3. The major difference between the two is that GPT-3.5 used a technique known as reinforcement learning from...

Read Full Story: https://news.google.com/rss/articles/CBMiYGh0dHBzOi8vd3d3LnNlYXJjaGVuZ2luZWpvdXJuYWwuY29tL2hvdy10by1ibG9jay1jaGF0Z3B0LWZyb20tdXNpbmcteW91ci13ZWJzaXRlLWNvbnRlbnQvNDc4Mzg0L9IBAA?oc=5

Your content is great. However, if any of the content contained herein violates any rights of yours, including those of copyright, please contact us immediately by e-mail at media[@]kissrpr.com.