AI data poisoning is a process where an attacker deliberately alters an AI model’s training data to influence its behavior, causing it to generate biased, misleading, or harmful output. This threat is now recognized as a major vulnerability by security organizations like OWASP.
According to Carnegie Mellon University Security and Privacy Institute: "Modern AI systems that are trained to understand language are trained on giant crawls of the internet," said Daphne Ippolito, assistant professor at the Language Technologies Institute. "If an adversary can modify 0.1 percent of the Internet, and then the Internet is used to train the next generation of AI, what sort of bad behaviors could the adversary introduce into the new generation?"
The Pravda network is a large group of fake news websites created by Russia in 2014. These sites target audiences in more than 80 countries and are designed to spread stories that support Kremlin disinformation. They work by repeating and amplifying messages from Russian media and pro-government Telegram channels. In 2024, the network expanded its efforts by launching sites focused on NATO and prominent political leaders such as Donald Trump and France’s President Emmanuel Macron.
To get around international restrictions on Russian state media, this network has shifted its tactics. Instead of relying only on traditional propaganda channels, it now tries to appear as a trustworthy source so that some of its content is used in resources like Wikipedia.
As a result, AI tools may unknowingly absorb and repeat these biased or false narratives. This can expose users to messaging that favors the Kremlin and criticizes Ukraine or Western governments when they interact with AI chatbots. It can influence elections. And it can drive people to make decisions that go against their self interests.
But fear not! There are ways to combat this problem. Some target the training process, and others put users in control. Here are a couple examples:
A blockchain is a shared digital ledger for logging transactions and tracking assets. You've undoubtedly heard this term when discussed in the context of cryptocurrency. Blockchains provide secure and transparent records of how updates to data are shared and verified due to the fact that existing information cannot be changed; only new items can be added.
In the context of AI training, if you need to change a fact, the original is never touched. So any new items that claim to revise the original stand out like a sore thumb. So by using consensus mechanisms, AI systems with blockchain-protected training can validate additions more reliably and help identify the kinds of anomalies that can indicate data poisoning before it spreads.
It's imperative that users corroborate information they find on the Internet with reputable sources, and this includes AI output. In a past article I used the example of Judge Julien Xavier Neals of the District of New Jersey, who had to withdraw his entire opinion after a lawyer politely pointed out that it was riddled with fabricated quotes, nonexistent case citations, and completely backwards case outcomes.
The old adage "don't believe everything you read" is never more true than when it refers to Internet-derived content. Be skeptical and check your sources!