Improved Image Caption Rating - Datasets, Game, and Model

Andrew Taylor Scott; Lothar D Narins; Anagha Kulkarni; Mar Castanon; Benjamin Kao; Shasta Ihorn; Yue-Ting Siu; Ilmi Yoon

doi:10.1145/3544549.3585632

Improved Image Caption Rating - Datasets, Game, and Model

Ext Abstr Hum Factors Computing Syst. 2023 Apr:2023:172. doi: 10.1145/3544549.3585632. Epub 2023 Apr 19.

Authors

Andrew Taylor Scott¹, Lothar D Narins¹, Anagha Kulkarni¹, Mar Castanon¹, Benjamin Kao¹, Shasta Ihorn², Yue-Ting Siu³, Ilmi Yoon¹

Affiliations

¹ Department of Computer Science, San Francisco State University, San Francisco, CA, USA.
² Department of Psychology, San Francisco State University, San Francisco, CA, USA.
³ Department of Special Education, San Francisco State University, San Francisco, CA, USA.

Abstract

How well a caption fits an image can be difficult to assess due to the subjective nature of caption quality. What is a good caption? We investigate this problem by focusing on image-caption ratings and by generating high quality datasets from human feedback with gamification. We validate the datasets by showing a higher level of inter-rater agreement, and by using them to train custom machine learning models to predict new ratings. Our approach outperforms previous metrics - the resulting datasets are more easily learned and are of higher quality than other currently available datasets for image-caption rating.

Keywords: human-in-the-loop; image captioning; multimodal learning; visually-impaired.

Grants and funding

90REGE0018/ACL/ACL HHS/United States