Impacts of Model Choice in XAI

Loading...
Thumbnail Image

Institution

http://id.loc.gov/authorities/names/n79058482

Degree Level

Master's

Degree

Master of Science

Department

Department of Computing Science

Supervisor / Co-Supervisor and Their Department(s)

Citation for Previous Publication

Link to Related Item

Abstract

Explainable artificial intelligence models are becoming increasingly important as restrictions grow for corporate use of blackbox models whose predictions affect people’s lives and yet cannot be interpreted. Black boxes do not convey trust to end-users and are difficult to train and debug for developers. Model agnostic explanation methods, like SHAP [23], can be used post hoc to shed light on these blackbox predictions. With access to a model’s predictions, SHAP can generate scores for relative feature importance. This work focuses on explanations generated for Natural Language Processing (NLP) where the features that SHAP uses are words. There are currently no generally accepted methods to generate explanations in NLP. However, SHAP can calculate importance scores for each word where the most important words can be taken as the explanation. SHAP should be structure-agnostic, meaning it should not be influenced by the number or types of layers in the model, it should only be influenced by the quality of the predition. Otherwise, SHAP predictions cannot be fairly compared across models because SHAP may be biased towards certain structures. Importance scores from SHAP are converted to a mask to either include or ignore each word of the input, providing the generated explanation. The Eraser [10] dataset provides human annotated explanations for NLP tasks that can be used as a gold standard by comparing them to the explanations generated by SHAP. An F1 score can then be used as a notion of the quality of the explanation by comparing the generated explanation to the human annotated explanation. This work investigates whether the quality of explanations generated by SHAP is structure agnostic. Using a dataset with ground truth explanations in a sentiment analysis task, we compare the SHAP output across different types of models. Our main finding is that CNN models using intrinsic explanation underperformed CNN models without intrinsic explanation, while having nearly identical accuracy. These findings demonstrate that the underlying model can impact SHAP’s performance and may favour certain structures of models.

Item Type

http://purl.org/coar/resource_type/c_46ec

Alternative

License

Other License Text / Link

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

en

Location

Time Period

Source