Training Meta’s Generative AI with My Books: A Successful Endeavor

When The Atlantic Revealed That Thousands of Books Were Used to Train Meta’s AI, Authors Reacted with Outrage

Last month, The Atlantic made a shocking announcement that Meta, the tech giant, had used tens of thousands of books without permission to train its AI language model. This revelation sparked anger among well-known authors, who called it a clear case of corporate wrongdoing. The outrage intensified when The Atlantic released a searchable database of affected books, revealing that even renowned authors like Lauren Groff had their work used without consent. The article stated, “The future promised by AI is written with stolen words.”

Initially, I couldn’t fully grasp the scale of the response or the claim that generative AI relies on mass theft. Perhaps I was envious of the attention famous authors were receiving, as they were specifically targeted as victims. However, I thought I might better understand their anger if my own work was being pirated for AI purposes. As it turns out, it is. Yesterday, I searched The Atlantic’s database using my name and discovered that three of the ten books I have authored or co-authored were included. It was an exciting moment for me to join the ranks of the aggrieved. Yet, despite my efforts, I found myself surprisingly unaffected. What was wrong with me?

The angry authors have emphasized the fact that their work was used without permission. This issue lies at the heart of a lawsuit filed in California by comedian Sarah Silverman and authors Richard Kadrey and Christopher Golden, who argue that Meta failed to seek their consent before using snippets of their text, known as “tokens,” to train its AI. Meta utilized their books in ways they did not anticipate and now disapprove of. While Meta has filed a motion to dismiss the lawsuit, the question of whether their actions constitute copyright infringement will ultimately be determined by the courts. However, the issue of permission is separate.

One of the inherent realities and joys of being an author is that your work will be used in unexpected ways. Philosopher Jacques Derrida spoke of “dissemination,” where authors, like plants releasing seeds, separate from their published work. Readers, viewers, and listeners not only can but must interpret that work in different contexts. A retiree reading a Haruki Murakami novel recommended by their grandchild, a high school student skimming Shakespeare for class, or my mother’s gardener reading my book on play at her suggestion—all of these uses require no permission and are integral to the nature of influence in general. Successful art surpasses its creator’s intentions.

However, internet culture has transformed permission into a moral entitlement. Many authors are present online and can swiftly correct any misunderstandings regarding their work. Likewise, hordes of fans are ready to enforce their interpretations of a book, movie, or album while rejecting any “wrong” interpretations. The Books3 controversy reveals this impulse to view certain interpretations as off-limits.

Perhaps Meta is not the ideal reader. Perhaps having my prose segmented into tokens is not how I would want it to be read. But who am I to dictate the purpose or benefits of my writing, even to a trillion-dollar corporation? To lament a single unintended use of my work is to undermine all the other unforeseen uses. As a writer, this disheartens me.

Now, I must admit, I feel a little bored—dare I say it—by the notion that Meta has stolen my life. If the theft and aggregation of works in Books3 are objectionable on moral or legal grounds, they should be condemned regardless of their incorporation into a particular technology company’s AI model. However, this doesn’t seem to be the case. The Books3 database itself was uploaded as a form of resistance against corporate giants. The individual who initially shared the repository intended to restore some control over the future to ordinary people, including authors. Meanwhile, Meta claims that the next generation of its AI model, which may or may not involve Books3 in its training data, is “free for research and commercial use.” While this statement warrants scrutiny, it also complicates the situation. Moreover, shortly after The Atlantic released its search tool for Books3, a writer shared a link allowing access to the feature without subscribing to the magazine. In other words, people can now express outrage about others obtaining writers’ work for free.

As both a writer and a citizen of the future, I find it hard to form a conclusive opinion about this matter. Theft is a fundamental sin of the internet, sometimes labeled as piracy (when software or books are uploaded) and other times hailed as innovation (such as when Google indexed the entire internet without permission) or even seen as liberation. AI merely continues this ambiguity. Based on the knowledge that a portion of my writing—along with trillions of other text snippets from sources like Amazon reviews and Reddit posts—has made it into an AI training set, I struggle to draw any significant or definitive conclusions about the Books3 story.

However, what about those Amazon reviewers, Redditors, Wikipedia contributors, long-abandoned bloggers, corporate copywriters, or even search engine optimization enthusiasts who fill the internet with content? Their work, too, will likely be absorbed by these massive language models. The sheer volume of textual material available for training AI models dwarfs even the collection of nearly 200,000 books.

I understand the inclination to hold literary works in higher regard than introductions to banana bread recipes, Am I the Asshole subreddit posts, or step-by-step instructions for replacing a water inlet valve. However, it is also pretentious. We, the authors of magazines and books, are professionals who have a personal investment in the gravity of our authorship. Nevertheless, we are a small minority. Almost anyone can write millions of words over the years on social media, in texts and emails, in reports and memos for their jobs. While I cherish books and hold them in high regard, as a published author and professional writer, I am probably the person least at risk of losing my connection to the written word and its rewards. If an AI composition of Stephen King and Yelp can outperform me, then who am I to call myself a writer?

I became an author because language provides a unique medium for experimenting with ideas. Words and sentences are flexible, and texts emerge from the subtext’s hidden depths. What I say encompasses what I don’t say and allows room for what you perceive. Once my books are bound, published, boxed, and shipped, they find their way to places I could never have predicted. They serve as vessels for ideas, but they can also be used as doorstops, bug killers, or the last inch of a makeshift laptop stand for an important Zoom call. They can even be disassembled into chunks, reassembled by the alien mind of a peculiar machine. And why not? I am an author, yes, but I am also a man who has arranged some words amidst countless others who have done the same. If authorship is merely vanity, then let the machines release us from our misery.

Reference

Denial of responsibility! Vigour Times is an automatic aggregator of Global media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, and all materials to their authors. For any complaint, please reach us at – [email protected]. We will take necessary action within 24 hours.
Denial of responsibility! Vigour Times is an automatic aggregator of Global media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, and all materials to their authors. For any complaint, please reach us at – [email protected]. We will take necessary action within 24 hours.
DMCA compliant image

Leave a Comment