Faulty or Ready? Handling Failures in Deep-Learning Computer Vision Models until Deployment: A Study of Practices, Challenges, and Needs

Agathe Balayn; Natasa Rikalo; Jie Yang; Alessandro Bozzon

doi:10.1145/3544548.3581555

Faulty or Ready? Handling Failures in Deep-Learning Computer Vision Models until Deployment: A Study of Practices, Challenges, and Needs

Agathe Balayn, Natasa Rikalo, Jie Yang, Alessandro Bozzon

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

56 Downloads (Pure)

Abstract

Handling failures in computer vision systems that rely on deep learning models remains a challenge. While an increasing number of methods for bug identification and correction are proposed, little is known about how practitioners actually search for failures in these models. We perform an empirical study to understand the goals and needs of practitioners, the workflows and artifacts they use, and the challenges and limitations in their process. We interview 18 practitioners by probing them with a carefully crafted failure handling scenario. We observe that there is a great diversity of failure handling workflows in which cooperations are often necessary, that practitioners overlook certain types of failures and bugs, and that they generally do not rely on potentially relevant approaches and tools originally stemming from research. These insights allow to draw a list of research opportunities, such as creating a library of best practices and more representative formalisations of practitioners' goals, developing interfaces to exploit failure handling artifacts, as well as providing specialized training.

Original language	English
Title of host publication	CHI 2023 - Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
Place of Publication	New York
Publisher	Association for Computing Machinery (ACM)
Number of pages	20
ISBN (Print)	978-1-4503-9421-5
DOIs	https://doi.org/10.1145/3544548.3581555
Publication status	Published - 2023
Event	2023 CHI Conference on Human Factors in Computing Systems - Congress Center Hamburg (CCH), Hamburg, Germany Duration: 23 Apr 2023 → 28 Apr 2023 https://chi2023.acm.org/

Conference

Conference	2023 CHI Conference on Human Factors in Computing Systems
Abbreviated title	CHI'23
Country/Territory	Germany
City	Hamburg
Period	23/04/23 → 28/04/23
Internet address	https://chi2023.acm.org/

Keywords

debugging
explainability
machine learning testing
practices

Access to Document

10.1145/3544548.3581555

3544548.3581555Final published version, 2.13 MBLicence: CC BY

Cite this

Balayn, A., Rikalo, N., Yang, J., & Bozzon, A. (2023). Faulty or Ready? Handling Failures in Deep-Learning Computer Vision Models until Deployment: A Study of Practices, Challenges, and Needs. In CHI 2023 - Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems Article 11 Association for Computing Machinery (ACM). https://doi.org/10.1145/3544548.3581555

@inproceedings{2a445dad39b54fa1b0b6e2541e68aa70,

title = "Faulty or Ready? Handling Failures in Deep-Learning Computer Vision Models until Deployment: A Study of Practices, Challenges, and Needs",

abstract = "Handling failures in computer vision systems that rely on deep learning models remains a challenge. While an increasing number of methods for bug identification and correction are proposed, little is known about how practitioners actually search for failures in these models. We perform an empirical study to understand the goals and needs of practitioners, the workflows and artifacts they use, and the challenges and limitations in their process. We interview 18 practitioners by probing them with a carefully crafted failure handling scenario. We observe that there is a great diversity of failure handling workflows in which cooperations are often necessary, that practitioners overlook certain types of failures and bugs, and that they generally do not rely on potentially relevant approaches and tools originally stemming from research. These insights allow to draw a list of research opportunities, such as creating a library of best practices and more representative formalisations of practitioners' goals, developing interfaces to exploit failure handling artifacts, as well as providing specialized training.",

keywords = "debugging, explainability, machine learning testing, practices",

author = "Agathe Balayn and Natasa Rikalo and Jie Yang and Alessandro Bozzon",

year = "2023",

doi = "10.1145/3544548.3581555",

language = "English",

isbn = "978-1-4503-9421-5",

booktitle = "CHI 2023 - Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems",

publisher = "Association for Computing Machinery (ACM)",

address = "United States",

note = "2023 CHI Conference on Human Factors in Computing Systems, CHI'23 ; Conference date: 23-04-2023 Through 28-04-2023",

url = "https://chi2023.acm.org/",

}

Balayn, A, Rikalo, N, Yang, J & Bozzon, A 2023, Faulty or Ready? Handling Failures in Deep-Learning Computer Vision Models until Deployment: A Study of Practices, Challenges, and Needs. in CHI 2023 - Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems., 11, Association for Computing Machinery (ACM), New York, 2023 CHI Conference on Human Factors in Computing Systems, Hamburg, Germany, 23/04/23. https://doi.org/10.1145/3544548.3581555

Faulty or Ready? Handling Failures in Deep-Learning Computer Vision Models until Deployment: A Study of Practices, Challenges, and Needs. / Balayn, Agathe; Rikalo, Natasa; Yang, Jie et al.
CHI 2023 - Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. New York: Association for Computing Machinery (ACM), 2023. 11.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Faulty or Ready? Handling Failures in Deep-Learning Computer Vision Models until Deployment

T2 - 2023 CHI Conference on Human Factors in Computing Systems

AU - Balayn, Agathe

AU - Rikalo, Natasa

AU - Yang, Jie

AU - Bozzon, Alessandro

PY - 2023

Y1 - 2023

N2 - Handling failures in computer vision systems that rely on deep learning models remains a challenge. While an increasing number of methods for bug identification and correction are proposed, little is known about how practitioners actually search for failures in these models. We perform an empirical study to understand the goals and needs of practitioners, the workflows and artifacts they use, and the challenges and limitations in their process. We interview 18 practitioners by probing them with a carefully crafted failure handling scenario. We observe that there is a great diversity of failure handling workflows in which cooperations are often necessary, that practitioners overlook certain types of failures and bugs, and that they generally do not rely on potentially relevant approaches and tools originally stemming from research. These insights allow to draw a list of research opportunities, such as creating a library of best practices and more representative formalisations of practitioners' goals, developing interfaces to exploit failure handling artifacts, as well as providing specialized training.

AB - Handling failures in computer vision systems that rely on deep learning models remains a challenge. While an increasing number of methods for bug identification and correction are proposed, little is known about how practitioners actually search for failures in these models. We perform an empirical study to understand the goals and needs of practitioners, the workflows and artifacts they use, and the challenges and limitations in their process. We interview 18 practitioners by probing them with a carefully crafted failure handling scenario. We observe that there is a great diversity of failure handling workflows in which cooperations are often necessary, that practitioners overlook certain types of failures and bugs, and that they generally do not rely on potentially relevant approaches and tools originally stemming from research. These insights allow to draw a list of research opportunities, such as creating a library of best practices and more representative formalisations of practitioners' goals, developing interfaces to exploit failure handling artifacts, as well as providing specialized training.

KW - debugging

KW - explainability

KW - machine learning testing

KW - practices

UR - http://www.scopus.com/inward/record.url?scp=85160012999&partnerID=8YFLogxK

U2 - 10.1145/3544548.3581555

DO - 10.1145/3544548.3581555

M3 - Conference contribution

AN - SCOPUS:85160012999

SN - 978-1-4503-9421-5

BT - CHI 2023 - Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

PB - Association for Computing Machinery (ACM)

CY - New York

Y2 - 23 April 2023 through 28 April 2023

ER -

Faulty or Ready? Handling Failures in Deep-Learning Computer Vision Models until Deployment: A Study of Practices, Challenges, and Needs

Abstract

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this