Exploring the Impact of Deleting (or Retaining) a Biased Item: A Procedure Based on Classification Accuracy

Assessment. 2024 Dec 10:10731911241298081. doi: 10.1177/10731911241298081. Online ahead of print.

Abstract

Psychological test scores are commonly used in high-stakes settings to classify individuals. While measurement invariance across groups is necessary for valid and meaningful inferences of group differences, full measurement invariance rarely holds in practice. The classification accuracy analysis framework aims to quantify the degree and practical impact of noninvariance. However, how to best navigate the next steps remains unclear, and methods devised to account for noninvariance at the group level may be insufficient when the goal is classification. Furthermore, deleting a biased item may improve fairness but negatively affect performance, and replacing the test can be costly. We propose item-level effect size indices that allow test users to make more informed decisions by quantifying the impact of deleting (or retaining) an item on test performance and fairness, provide an illustrative example, and introduce unbiasr, an R package implementing the proposed methods.

Keywords: R package; classification accuracy; fairness; item bias; measurement invariance.