The No. 1 Question to Ask When Evaluating AI Tools

Sarah Lebovitz; Hila Lifshitz-Assaf; Natalia Levina

Magazine Spring 2023 Issue Research Highlight

The No. 1 Question to Ask When Evaluating AI Tools

Determining whether an AI solution is worth implementing requires looking past performance reports and finding the ground truth on which the AI has been trained and validated.

Sarah Lebovitz, Hila Lifshitz-Assaf, and Natalia Levina March 07, 2023 Reading Time: 12 min

Topics

Permissions and PDF

Twitter Facebook Linkedin

In the fast-moving and highly competitive artificial intelligence sector, developers’ claims that their AI tools can make critical predictions with a high degree of accuracy are key to selling prospective customers on their value. Because it can be daunting for people who are not AI experts to evaluate these tools, leaders may be tempted to rely on the high-level performance metrics published in sales materials. But doing so often leads to disappointing or even risky implementations.

Over the course of an 11-month investigation, we observed managers in a leading health care organization as they conducted internal pilot studies of five AI tools. Impressive performance results had been promised for each, but several of the tools did extremely poorly in their pilots. Analyzing the evaluation process, we found that an effective way to determine an AI tool’s quality is understanding and examining its ground truth.1 In this article, we’ll explain what that is and how managers can dig into it to better assess whether a particular AI tool may enhance or diminish decision-making in their organization.

Get Updates on Leading With AI and Data

Get monthly insights on how artificial intelligence impacts your organization and what it means for your company and customers.

Please enter a valid email address

Thank you for signing up

Privacy Policy

What Is the Ground Truth of the AI Tool?

The quality of an AI tool — and the value it can bring your organization — is enabled by the quality of the ground truth used to train and validate it. In general, ground truth is defined as information that is known to be true based on objective, empirical evidence. In AI, ground truth refers to the data in training data sets that teaches an algorithm how to arrive at a predicted output; ground truth is considered to be the “correct” answer to the prediction problem that the tool is learning to solve. This data set then becomes the standard against which developers measure the accuracy of the system’s predictions. For instance, teaching a model to identify the best job candidates requires training data sets describing candidates’ features, such as education and years of experience, where each is associated with a classification of either “good candidate” (true) or “not a good candidate” (false).

Topics

About the Authors

Sarah Lebovitz is an assistant professor at the McIntire School of Commerce at the University of Virginia. Hila Lifshitz-Assaf is a professor at Warwick University and a faculty affiliate at the Lab for Innovation Science at Harvard. Natalia Levina is a professor at New York University’s Stern School of Business.

References

1. S. Lebovitz, N. Levina, and H. Lifshitz-Assaf, “Is AI Ground Truth Really True? The Dangers of Training and Evaluating AI Tools Based on Experts’ Know-What,” MIS Quarterly 45, no. 3 (September 2021): 1501-1525.

2. C. DeBrusk, “The Risk of Machine-Learning Bias (and How to Prevent It),” MIT Sloan Management Review, March 26, 2018, https://sloanreview.mit.edu.

3. “Classification: ROC Curve and AUC,” Machine Learning Crash Course, Google, last modified July 18, 2022, https://developers.google.com.

Tags:

Artificial Intelligence

Reprint #:

64314

Add a comment Cancel reply

You must sign in to post a comment.

First time here? Sign up for a free account: Comment on articles and get access to many more articles.

Comment (1)

Marwah Younis

December 13, 2023

I really enjoyed reading this article and it gave me a real 'AHA!' moment. We often talk about how useless tools are that just 'digitize' paper based work, it appears that this is also true when talking about AI tools that just 'digitize' initial human judgements without validation of outcomes. The HR example, especially, was really straight forward to understand this concept. When looking to evaluate AI tools for clinical practice, understanding the 'ground truth' and aligning the developers' ground truth with actual gold standards of experts in the field will be of the utmost importance. 

We want to develop AI tools that will be better than humans at making predictions, not just as good as humans at making predictions.

Topics

Get Updates on Leading With AI and Data

What Is the Ground Truth of the AI Tool?

Topics

About the Authors

References

Tags:

Reprint #:

More Like This

Add a comment Cancel reply

Comment (1)

Marwah Younis