Impairment of arbitration between model-based and model-free reinforcement learning in obsessive-compulsive disorder

Front Psychiatry. 2023 May 26:14:1162800. doi: 10.3389/fpsyt.2023.1162800. eCollection 2023.

Abstract

Introduction: Obsessive-compulsive disorder (OCD) is characterized by an imbalance between goal-directed and habitual learning systems in behavioral control, but it is unclear whether these impairments are due to a single system abnormality of the goal-directed system or due to an impairment in a separate arbitration mechanism that selects which system controls behavior at each point in time.

Methods: A total of 30 OCD patients and 120 healthy controls performed a 2-choice, 3-stage Markov decision-making paradigm. Reinforcement learning models were used to estimate goal-directed learning (as model-based reinforcement learning) and habitual learning (as model-free reinforcement learning). In general, 29 high Obsessive-Compulsive Inventory-Revised (OCI-R) score controls, 31 low OCI-R score controls, and all 30 OCD patients were selected for the analysis.

Results: Obsessive-compulsive disorder (OCD) patients showed less appropriate strategy choices than controls regardless of whether the OCI-R scores in the control subjects were high (p = 0.012) or low (p < 0.001), specifically showing a greater model-free strategy use in task conditions where the model-based strategy was optimal. Furthermore, OCD patients (p = 0.001) and control subjects with high OCI-R scores (H-OCI-R; p = 0.009) both showed greater system switching rather than consistent strategy use in task conditions where model-free use was optimal.

Conclusion: These findings indicated an impaired arbitration mechanism for flexible adaptation to environmental demands in both OCD patients and healthy individuals reporting high OCI-R scores.

Keywords: arbitration system; goal-directed system; habitual system; model-based reinforcement learning; model-free reinforcement learning; obsessive-compulsive disorder.

Grants and funding

This study was supported by the National Science and Technology Innovation 2030 Major Program (no. 2021ZD0203800 to QC), the National Natural Science Foundation of China (nos. 32071049 and 31671135 to QC, 32171081, 31871113, and 31920103009 to ZP), and the Guangdong Basic and Applied Basic Research Foundation, China (no. 2022A1515012185 to QC).