Explainable artificial intelligence in forensics: Realistic explanations for number of contributor predictions of DNA profiles

M.S. Veldhuis; Simone Ariëns; Rolf J.F. Ypma; T.E.P.M.F. Abeel; Corina C.G. Benschop

doi:10.1016/j.fsigen.2021.102632

Explainable artificial intelligence in forensics: Realistic explanations for number of contributor predictions of DNA profiles

M.S. Veldhuis, Simone Ariëns, Rolf J.F. Ypma, T.E.P.M.F. Abeel, Corina C.G. Benschop

Pattern Recognition and Bioinformatics

Research output: Contribution to journal › Article › Scientific › peer-review

7 Citations (Scopus)

58 Downloads (Pure)

Abstract

Machine learning obtains good accuracy in determining the number of contributors (NOC) in short tandem repeat (STR) mixture DNA profiles. However, the models used so far are not understandable to users as they only output a prediction without any reasoning for that conclusion. Therefore, we leverage techniques from the field of explainable artificial intelligence (XAI) to help users understand why specific predictions are made. Where previous attempts at explainability for NOC estimation have relied upon using simpler, more understandable models that achieve lower accuracy, we use techniques that can be applied to any machine learning model. Our explanations incorporate SHAP values and counterfactual examples for each prediction into a single visualization. Existing methods for generating counterfactuals focus on uncorrelated features. This makes them inappropriate for the highly correlated features derived from STR data for NOC estimation, as these techniques simulate combinations of features that could not have resulted from an STR profile. For this reason, we have constructed a new counterfactual method, Realistic Counterfactuals (ReCo), which generates realistic counterfactual explanations for correlated data. We show that ReCo outperforms state-of-the-art methods on traditional metrics, as well as on a novel realism score. A user evaluation of the visualization shows positive opinions of end-users, which is ultimately the most appropriate metric in assessing explanations for real-world settings.

Original language	English
Article number	102632
Journal	Forensic Science International
Volume	56
DOIs	https://doi.org/10.1016/j.fsigen.2021.102632
Publication status	Published - 2022

Bibliographical note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

Number of contributors
Explainable artificial intelligence
DNA mixtures
Machine learning
Counterfactual explanations

Access to Document

10.1016/j.fsigen.2021.102632

Veldhuis2021Final published version, 3.64 MB

Cite this

@article{b958ebede89446978678710a3c92d5e2,

title = "Explainable artificial intelligence in forensics: Realistic explanations for number of contributor predictions of DNA profiles",

abstract = "Machine learning obtains good accuracy in determining the number of contributors (NOC) in short tandem repeat (STR) mixture DNA profiles. However, the models used so far are not understandable to users as they only output a prediction without any reasoning for that conclusion. Therefore, we leverage techniques from the field of explainable artificial intelligence (XAI) to help users understand why specific predictions are made. Where previous attempts at explainability for NOC estimation have relied upon using simpler, more understandable models that achieve lower accuracy, we use techniques that can be applied to any machine learning model. Our explanations incorporate SHAP values and counterfactual examples for each prediction into a single visualization. Existing methods for generating counterfactuals focus on uncorrelated features. This makes them inappropriate for the highly correlated features derived from STR data for NOC estimation, as these techniques simulate combinations of features that could not have resulted from an STR profile. For this reason, we have constructed a new counterfactual method, Realistic Counterfactuals (ReCo), which generates realistic counterfactual explanations for correlated data. We show that ReCo outperforms state-of-the-art methods on traditional metrics, as well as on a novel realism score. A user evaluation of the visualization shows positive opinions of end-users, which is ultimately the most appropriate metric in assessing explanations for real-world settings.",

keywords = "Number of contributors, Explainable artificial intelligence, DNA mixtures, Machine learning, Counterfactual explanations",

author = "M.S. Veldhuis and Simone Ari{\"e}ns and Ypma, {Rolf J.F.} and T.E.P.M.F. Abeel and Benschop, {Corina C.G.}",

note = "Green Open Access added to TU Delft Institutional Repository {\textquoteleft}You share, we take care!{\textquoteright} – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. ",

year = "2022",

doi = "10.1016/j.fsigen.2021.102632",

language = "English",

volume = "56",

journal = "Forensic Science International",

issn = "0379-0738",

publisher = "Elsevier",

}

TY - JOUR

T1 - Explainable artificial intelligence in forensics: Realistic explanations for number of contributor predictions of DNA profiles

AU - Veldhuis, M.S.

AU - Ariëns, Simone

AU - Ypma, Rolf J.F.

AU - Abeel, T.E.P.M.F.

AU - Benschop, Corina C.G.

N1 - Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2022

Y1 - 2022

N2 - Machine learning obtains good accuracy in determining the number of contributors (NOC) in short tandem repeat (STR) mixture DNA profiles. However, the models used so far are not understandable to users as they only output a prediction without any reasoning for that conclusion. Therefore, we leverage techniques from the field of explainable artificial intelligence (XAI) to help users understand why specific predictions are made. Where previous attempts at explainability for NOC estimation have relied upon using simpler, more understandable models that achieve lower accuracy, we use techniques that can be applied to any machine learning model. Our explanations incorporate SHAP values and counterfactual examples for each prediction into a single visualization. Existing methods for generating counterfactuals focus on uncorrelated features. This makes them inappropriate for the highly correlated features derived from STR data for NOC estimation, as these techniques simulate combinations of features that could not have resulted from an STR profile. For this reason, we have constructed a new counterfactual method, Realistic Counterfactuals (ReCo), which generates realistic counterfactual explanations for correlated data. We show that ReCo outperforms state-of-the-art methods on traditional metrics, as well as on a novel realism score. A user evaluation of the visualization shows positive opinions of end-users, which is ultimately the most appropriate metric in assessing explanations for real-world settings.

AB - Machine learning obtains good accuracy in determining the number of contributors (NOC) in short tandem repeat (STR) mixture DNA profiles. However, the models used so far are not understandable to users as they only output a prediction without any reasoning for that conclusion. Therefore, we leverage techniques from the field of explainable artificial intelligence (XAI) to help users understand why specific predictions are made. Where previous attempts at explainability for NOC estimation have relied upon using simpler, more understandable models that achieve lower accuracy, we use techniques that can be applied to any machine learning model. Our explanations incorporate SHAP values and counterfactual examples for each prediction into a single visualization. Existing methods for generating counterfactuals focus on uncorrelated features. This makes them inappropriate for the highly correlated features derived from STR data for NOC estimation, as these techniques simulate combinations of features that could not have resulted from an STR profile. For this reason, we have constructed a new counterfactual method, Realistic Counterfactuals (ReCo), which generates realistic counterfactual explanations for correlated data. We show that ReCo outperforms state-of-the-art methods on traditional metrics, as well as on a novel realism score. A user evaluation of the visualization shows positive opinions of end-users, which is ultimately the most appropriate metric in assessing explanations for real-world settings.

KW - Number of contributors

KW - Explainable artificial intelligence

KW - DNA mixtures

KW - Machine learning

KW - Counterfactual explanations

UR - http://www.scopus.com/inward/record.url?scp=85119935982&partnerID=8YFLogxK

U2 - 10.1016/j.fsigen.2021.102632

DO - 10.1016/j.fsigen.2021.102632

M3 - Article

SN - 0379-0738

VL - 56

JO - Forensic Science International

JF - Forensic Science International

M1 - 102632

ER -

Explainable artificial intelligence in forensics: Realistic explanations for number of contributor predictions of DNA profiles

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this