Consistent repetitions of an action lead to plastic change in the motor cortex and cause shift in the direction of future movements. This process is known as use-dependent plasticity (UDP), one of the basic forms of the motor memory. We have recently demonstrated in a physiological study that success-related reinforcement signals could modulate the strength of UDP. We tested this idea by developing a computational approach that modeled the shift in the direction of future action as a change in preferred direction of population activity of neurons in the primary motor cortex. The rate of the change follows a modified temporal difference reinforcement learning algorithm, in which the learning policy is based on comparison between what reward the population experiences on a particular trial, and what it had expected on the basis of its previous learning. By using this model, we were able to characterize the nature of learning and retention of UDP. Exploring the relationship between reinforcement and UDP constitutes a crucial step toward understanding the basic blocks involved in the formation of motor memories.