Impairment of arbitration between model-based and model-free reinforcement learning in obsessive-compulsive disorder

Zhongqiang Ruan; Carol A Seger; Qiong Yang; Dongjae Kim; Sang Wan Lee; Qi Chen; Ziwen Peng

doi:10.3389/fpsyt.2023.1162800

Impairment of arbitration between model-based and model-free reinforcement learning in obsessive-compulsive disorder

Front Psychiatry. 2023 May 26:14:1162800. doi: 10.3389/fpsyt.2023.1162800. eCollection 2023.

Authors

Zhongqiang Ruan¹, Carol A Seger^{1

2}, Qiong Yang³, Dongjae Kim⁴, Sang Wan Lee⁵, Qi Chen⁶, Ziwen Peng^{1

7

8}

Affiliations

¹ Guangdong Key Laboratory of Mental Health and Cognitive Science, School of Psychology, Center for Studies of Psychological Application, South China Normal University, Guangzhou, China.
² Department of Psychology, Colorado State University, Fort Collins, CO, United States.
³ Affective Disorder Center, Affiliated Brain Hospital of Guangzhou Medical University (Guangzhou Huiai Hospital), Guangzhou, China.
⁴ Department of AI-based Convergence, College of Engineering, Dankook University, Yongin, Republic of Korea.
⁵ Department of Bio and Brain Engineering, Program of Brain and Cognitive Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea.
⁶ School of Psychology, Shenzhen University, Shenzhen, China.
⁷ Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, Guangzhou, China.
⁸ Department of Child Psychiatry, Shenzhen Kangning Hospital, Shenzhen University School of Medicine, Shenzhen, China.

Abstract

Introduction: Obsessive-compulsive disorder (OCD) is characterized by an imbalance between goal-directed and habitual learning systems in behavioral control, but it is unclear whether these impairments are due to a single system abnormality of the goal-directed system or due to an impairment in a separate arbitration mechanism that selects which system controls behavior at each point in time.

Methods: A total of 30 OCD patients and 120 healthy controls performed a 2-choice, 3-stage Markov decision-making paradigm. Reinforcement learning models were used to estimate goal-directed learning (as model-based reinforcement learning) and habitual learning (as model-free reinforcement learning). In general, 29 high Obsessive-Compulsive Inventory-Revised (OCI-R) score controls, 31 low OCI-R score controls, and all 30 OCD patients were selected for the analysis.

Results: Obsessive-compulsive disorder (OCD) patients showed less appropriate strategy choices than controls regardless of whether the OCI-R scores in the control subjects were high (p = 0.012) or low (p < 0.001), specifically showing a greater model-free strategy use in task conditions where the model-based strategy was optimal. Furthermore, OCD patients (p = 0.001) and control subjects with high OCI-R scores (H-OCI-R; p = 0.009) both showed greater system switching rather than consistent strategy use in task conditions where model-free use was optimal.

Conclusion: These findings indicated an impaired arbitration mechanism for flexible adaptation to environmental demands in both OCD patients and healthy individuals reporting high OCI-R scores.

Keywords: arbitration system; goal-directed system; habitual system; model-based reinforcement learning; model-free reinforcement learning; obsessive-compulsive disorder.

Grants and funding

This study was supported by the National Science and Technology Innovation 2030 Major Program (no. 2021ZD0203800 to QC), the National Natural Science Foundation of China (nos. 32071049 and 31671135 to QC, 32171081, 31871113, and 31920103009 to ZP), and the Guangdong Basic and Applied Basic Research Foundation, China (no. 2022A1515012185 to QC).