By SuperintelligenceNews.com
Tech giants like Apple, Nvidia, and Anthropic are under fire for allegedly using thousands of YouTube videos to train their AI models without the creators’ consent, an investigation by Proof News reveals. Despite YouTube’s policies against unauthorized data harvesting, companies utilized subtitles from 173,536 videos across 48,000 channels to feed their AI systems.
The Scope of Data Use
The dataset, known as YouTube Subtitles, includes transcripts from educational channels such as Khan Academy, MIT, and Harvard, as well as media outlets like NPR and the BBC. Popular shows like “The Late Show with Stephen Colbert” and “Last Week Tonight with John Oliver” also had their videos used. Notable YouTube personalities, including MrBeast and PewDiePie, found their content used in this unauthorized manner.
Creators’ Outcry
David Pakman, host of “The David Pakman Show,” discovered nearly 160 of his videos were part of this dataset. āNo one came to me and said, āWe would like to use this,āā Pakman stated, emphasizing the lack of consent and the impact on creators’ livelihoods. Dave Wiskus, CEO of Nebula, echoed these sentiments, calling it “disrespectful” and highlighting the potential exploitation of artists by generative AI.
Corporate Responses and Legal Challenges
Anthropic confirmed using the Pile dataset, which includes YouTube Subtitles, to train its AI, arguing that the dataset’s public availability justifies its use. Salesforce and other companies involved in training models with the Pile echoed similar defenses, despite the controversy. Notably, Nvidia and Apple declined to comment on these allegations.
The Legal Landscape
The situation has parallels with the controversy surrounding Books3, another dataset from the Pile, which led to lawsuits from authors whose works were included without permission. These cases underscore the ongoing legal and ethical battles over AI training data.
Future Implications
As AI continues to evolve, the use of publicly available data for training purposes remains contentious. Creators like Pakman worry about the implications for their work and the broader industry, advocating for compensation and regulation to address unauthorized use.