When the AI goes haywire, bring on the humans

When the AI goes haywire, bring on the humans
Hundreds of shoppers line-up for blocks waiting to purchase supplies at a Costco due to the global outbreak of?coronavirus in Garden Grove, California, US, on March 14, 2020.
PHOTO: Reuters

OAKLAND, Calif. - Used by two-thirds of the world's 100 biggest banks to aid lending decisions, credit scoring giant Fair Isaac Corp and its artificial intelligence software can wreak havoc if something goes wrong.

That crisis nearly came to pass early in the pandemic. As Fico recounted to Reuters, the Bozeman, Montana company's AI tools for helping banks identify credit and debit card fraud concluded that a surge in online shopping meant fraudsters must have been busier than usual.

The AI software told banks to deny millions of legitimate purchases, at a time when consumers had been scrambling for toilet paper and other essentials.

But consumers ultimately faced few denials, according to Fico. The company said a global group of 20 analysts who constantly monitor its systems recommended temporary adjustments that avoided a blockade on spending. The team is automatically alerted to unusual buying activity that could confuse the AI, relied on by 9,000 financial institutions overall to detect fraud across two billion cards.

Such corporate teams, part of the emerging job specialty of machine learning operations (MLOps), are unusual. In separate surveys last year, Fico and the consultancy McKinsey & Co found that most organizations surveyed are not regularly monitoring AI-based programs after launching them.

The problem is that errors can abound when real-world circumstances deviate, or in tech parlance "drift," from the examples used to train AI, according to scientists managing these systems. In Fico's case, it said its software expected more in-person than virtual shopping, and the flipped ratio led to a greater share of transactions flagged as problematic.

Seasonal variations, data-quality changes or momentous events - such as the pandemic - all can lead to a string of bad AI predictions.

Imagine a system recommending swimsuits to summer shoppers, not realizing that COVID lockdowns had made sweatpants more suitable. Or a facial recognition system becoming faulty because masking had become popular.

The pandemic must have been a "wake-up call" for anyone not closely monitoring AI systems because it induced countless behavioral shifts, said Aleksander Madry, director of the Centre for Deployable Machine Learning at Massachusetts Institute of Technology.

Coping with drift is a huge problem for organizations leveraging AI, he said. "That's what really stops us currently from this dream of AI revolutionizing everything."

Adding to the urgency for users to address the issue, the European Union plans to pass a new AI law as soon as next year requiring some monitoring. The White House this month in new AI guidelines also called for monitoring to ensure system "performance does not fall below an acceptable level over time."

Being slow to notice issues can be costly. Unity Software Inc (UN), whose ad software helps video games attract players, in May estimated that it would lose $110 million in sales this year, or about 8per cent of total expected revenue, after customers pulled back when its AI tool that determines whom to show ads to stopped working as well as it once did. Also to blame was its AI system learning from corrupted data, the company said.

Unity, based in San Francisco, declined to comment beyond earnings-call statements. Executives there said Unity was deploying alerting and recovery tools to catch problems faster and acknowledged expansion and new features had taken precedence over monitoring.

Real estate marketplace Zillow Group Inc last November announced a $304 million write down on homes it bought - based on a price-forecasting algorithm - for amounts higher than they could be resold for. The Seattle company said the AI could not keep pace with rapid and unprecedented market swings and exited the buying-selling business.

New market

AI can go awry in many ways. Most well known is that training data skewed along race or other lines can prompt unfairly biased predictions. Many companies now vet data beforehand to prevent this, according to the surveys and industry experts. By comparison, few companies consider the danger of a well-performing model that later breaks, those sources say.

"It's a pressing problem," said Sara Hooker, head of research lab Cohere For AI. "How do you update models that become stale as the world changes around it?"


Several startups and cloud computing giants in the past couple of years have started selling software to analyze performance, set alarms and introduce fixes that together intend to help teams keep tabs on AI. IDC, a global market researcher, estimates spending on tools for AI operations to reach at least $2 billion in 2026 from $408 million last year.

Venture capital investment in AI development and operations companies rose last year to nearly $13 billion, and $6 billion has poured in so far this year, according to data from PitchBook, a Seattle company tracking financings.

Arize AI, which raised $38 million from investors last month, enables monitoring for customers including Uber, Chick-fil-A and Procter & Gamble. Chief Product Officer Aparna Dhinakaran said she struggled at a previous employer to quickly spot AI predictions turning poor and friends elsewhere told her about their own delays.

"The world of today is you don't know there's an issue until a business impact two months down the road," she said.

Fraud scores 

Some AI users have built their own monitoring capabilities and that is what Fico said saved it at the start of the pandemic.

Alarms were triggered as more purchases occurred online - what the industry calls "card not present." Historically, more of this spending tends to be fraudulent and the surge pushed transactions higher on Fico's 1-to-999 scale (the higher it is, the more likely it is fraud), said Scott Zoldi, chief analytics officer at Fico.

Zoldi said consumer habits were changing too fast to rewrite the AI system. So Fico advised US clients to review and reject only transactions scored above 900, up from 850, he said. It spared clients from reviewing 67 per cent of legitimate transactions above the old threshold, and allowed them instead to focus on truly problematic cases.

Clients went on to detect 25 per cent more of total US fraud during the first six months of the pandemic than would have been expected and 60 per cent more in the United Kingdom, Zoldi said.

"You are not responsible with AI unless you are monitoring," he said.

This website is best viewed using the latest versions of web browsers.