The US-Iran war has delivered a critical lesson for IT leaders.
Enterprises have always had to deal with bad data in their environments, whether from someone cutting corners, an ancient database that everyone is scared to delete, or conflicting systems digested during one of the dozens of acquisitions over the last decade. But AI is going to supercharge those data problems, and ignoring them is no longer a viable choice.
Consider the US military’s bombing of an Iranian girls’ school on Feb. 28, which killed at least 165 people, most of them children. According to The New York Times, military investigators say that the cause of the incorrect bombing was bad data. Apparently, that building had been — years earlier — used by the Iranian military. The building was separated from the military compound and converted to a school a decade ago, but no one updated the US intelligence records. That’s why the AI-powered targeting system selected it for a bombing strike.
“Officers at U.S. Central Command created the target coordinates for the strike using outdated data provided by the Defense Intelligence Agency,” The Times reported. “Military targeting is very complex and involves multiple agencies. Many officers would have been responsible for verifying that the data is correct, and officers at Central Command are responsible for checking the information they receive from the Defense Intelligence Agency or another intelligence agency. But in a fast-moving situation, like the opening days of a war, information is sometimes not verified.”
To be clear, the investigation shows that AI is not to blame for the school bombing; the error came from faulty data and people who didn’t verify it. As The Guardian pointed out, “The school appeared in Iranian business listings. It was visible on Google Maps. A search engine could have found it. Nobody searched.”
This gets us into the practical logistical realities, both for a massive military complex delivering a large number of simultaneous attacks and for an enterprise leveraging billions of bits of data being crunched by genAI or autonomous agents. It’s all but impossible for anyone to verify every single data point.
Remember that the key advantage of AI deployments is that they can deal with petabytes of data in ways that human teams cannot.
The school bombing tragedy is an extreme example, but it reinforces the concern that AI is going to use whatever data it can access. That’s especially dangerous with autonomous systems, which will assume the data is accurate and leverage that data to make decisions and take action. This is every bit as true when a hospital is analyzing test results, a retailer is trying to project product assortment needs, or a manufacturer is trying to predict how much raw material it needs for upcoming projects.
IT professionals know better. For dozens of reasons, outdated or flawed data is in the system. They understand how it got there initially. What doesn’t make sense, and yet is still understandable, is why no one has tried verifying all the data and removing bad data.
IT leaders are worried about the existing 67 projects they are trying to juggle. It is never going to rise to the top of the IT director’s triage list to assign someone to do a deep dive into petabytes of data, across all divisions, business units, and subsidiaries globally, somehow trying to sniff out flawed data.
The task above sounds like a perfect assignment for generative AI. But what if it hallucinates while it is trying to verify data?
This gets potentially worse. A lot of this data was put into the system when acquisition databases were absorbed. During the first months of the assimilation of a company being acquired, there are a ton of things that have to happen to keep revenue flowing. Verifying the legitimacy of databases typically doesn’t make the cut.
But today, years after that data was absorbed from a team that may no longer be around, what procedure could meaningfully evaluate that old data for accuracy? And the longer such an evaluation is delayed, the larger the number of errors that will permeate the environment.
An IT working group could use a variety of guidelines to weed out such data, not by determining the accuracy of the old data, but by identifying large chunks of data that can simply be wiped. An example might be: “Any prospect list that is more than 10 years old should be automatically wiped, given the strong chance that little to none of that data would be viable.”
David Neuman, the COO at consulting firm Acceligence, pointed out that enterprises should also identify databases that should be retained for as long as possible, “such as scientific data, especially meteorological data.”
This brings us back to agentic systems. When these autonomous agents are rummaging through your environment trying to perform complex tasks and find obscure answers, they are going to run into that bad data and act on it.
That is why data cleaning is now urgent. Five years ago, bad data would have likely slowed things down, but it would have been unlikely for many workers to access it and to rely on it. Not so with AI agents. Unless they are told otherwise, they treat all data as valid.
Do you want the benefits of accelerated AI systems and especially autonomous ones? You better yank a bunch of people off of LOB projects and figure out a way to sniff out and remove that bad data before an agent finds it.