⚠️

Ad Blocker Detected

We've detected that you're using an ad blocker.

Our website relies on advertising revenue to provide free content and services. Please disable your ad blocker to continue using our website.

How to Disable Ad Blocker:

  1. Click on your ad blocker extension icon in your browser toolbar (usually in the top-right corner)
  2. Select "Disable on this site" or "Allow ads on this site"
  3. Refresh this page or click the "Check Again" button below

OpenAI desperate to avoid explaining why it deleted pirated book datasets

  • Home
  • Blog
  • OpenAI desperate to avoid explaining why it deleted pirated book datasets
OpenAI desperate to avoid explaining why it deleted pirated book datasets

OpenAI desperate to avoid explaining why it deleted pirated book datasets

OpenAI may soon be forced to explain why it deleted a pair of controversial datasets composed of pirated books, and the stakes could not be higher. At the heart of a class-action lawsuit from authors...

OpenAI may soon be forced to explain why it deleted a pair of controversial datasets composed of pirated books, and the stakes could not be higher.

At the heart of a class-action lawsuit from authors alleging that ChatGPT was illegally trained on their works, OpenAI’s decision to delete the datasets could end up being a deciding factor that gives the authors the win.

It’s undisputed that OpenAI deleted the datasets, known as “Books 1” and “Books 2,” prior to ChatGPT’s release in 2022. Created by former OpenAI employees in 2021, the datasets were built by scraping the open web and seizing the bulk of its data from a shadow library called Library Genesis (LibGen).

Read full article

Comments

Ashley Belanger

Author of this blog post from Arfi Foundation.