Meta Admitted to Illegally Downloading Millions of Books for AI Training

Meta, the parent company of Facebook, Instagram, and WhatsApp, has admitted to illegally downloading vast quantities of books to train its artificial intelligence (AI). Documents presented in a US lawsuit against the company, including internal emails, confirm the accusation.

The case was brought by artists and writers who claim Meta downloaded copyrighted works from illicit sources without compensation. This data was used to train Meta's language model, which can generate content and answer user questions.

Meta previously acknowledged downloading entire databases from pirate sources like LibGen. However, the newly released emails reveal further details: in addition to the 80.6 TB downloaded from LibGen, Meta downloaded 35.7 TB of books from another platform and at least 81.7 TB of data from Anna's Archive, a service offering works without copyright.

The situation for Meta could worsen due to the method used: downloading via torrent means the company contributed to the illegal download of books by other users, as it served as a seed provider for the community. Meta has not yet provided the details requested by the plaintiffs regarding the downloads.

Emails exchanged by Meta employees confirm they were aware that downloading books via torrent from databases like LibGen was illegal and could jeopardize business contracts or complicate the company's future.

"Downloading torrent from a corporate laptop doesn't seem right," said Meta researcher Nikolay Bashlykov in one email, accompanied by a laughing emoji. In another message, an employee suggests that "OpenAI's model is probably trained" on similar databases, while another says using a VPN to mask the connection during download would be a viable alternative.

This debate indicates Meta tried to conceal its activities, using servers outside the company to prevent data from being linked to Facebook's parent company. They even modified torrent client settings to send the minimum possible seeds to other users.

CEO and co-founder Mark Zuckerberg is also mentioned. In one message, a collaborator informs that the "decision to use" LibGen as a source came "after the situation escalated to MZ," indicating he approved or was at least informed of the process, contradicting previous statements denying the executive's involvement.

Meta has yet to comment on the publication of the new evidence. Previously, the company suggested that AI training from entire databases and books was a matter of "fair use" - the acceptable use of intellectual property for certain purposes without requiring authorization or payment to the owner.

With the evidence in hand, the plaintiffs' lawyers now want to recall certain witnesses, especially as their initial responses are now considered contradictory. They believe adding the argument that the company tried to hide the download and may have collaborated in the availability of files via torrent could worsen the case.

Apakah Anda menemukan kesalahan atau ketidakakuratan?

Kami akan mempertimbangkan komentar Anda sesegera mungkin.