Pluribus Training Data: A Necessary Evil?
Published:Dec 27, 2025 15:43
•1 min read
•Simon Willison
Analysis
This short blog post uses a reference to the TV show "Pluribus" to illustrate the author's conflicted feelings about the data used to train large language models (LLMs). The author draws a parallel between the show's characters being forced to consume Human Derived Protein (HDP) and the ethical compromises made in using potentially problematic or copyrighted data to train AI. While acknowledging the potential downsides, the author seems to suggest that the benefits of LLMs outweigh the ethical concerns, similar to the characters' acceptance of HDP out of necessity. The post highlights the ongoing debate surrounding AI ethics and the trade-offs involved in developing powerful AI systems.
Key Takeaways
- •LLM training often involves ethical compromises regarding data sources.
- •The benefits of LLMs may be seen as outweighing the ethical concerns in some cases.
- •The analogy to "Pluribus" highlights the feeling of being forced to accept a less-than-ideal situation.
Reference
“Given our druthers, would we choose to consume HDP? No. Throughout history, most cultures, though not all, have taken a dim view of anthropophagy. Honestly, we're not that keen on it ourselves. But we're left with little choice.”