Polarization gradient cooling (PGC) plays an important role in many cold atom applications including the formation of Bose-Einstein condensates (BECs) and cooling of single atoms. Traditional parameter optimization of PGC usually relies on subjective expertise, faces challenges in fine manipulation, and exhibits low optimization efficiency. Here, we propose a segmented control method that differs from the traditional PGC process by expanding the experiment parameters from 3 to 30. Subsequently, the conventional timing optimization problem is reformulated as a Markov decision process (MDP), and the experiment parameters are optimized using a reinforcement learning model. With proper settings of hyperparameters, the learning process exhibits good convergence and powerful parameter exploration capabilities. Finally, we capture ∼4.3 × 108 cold atoms, with a phase space density of ∼7.1 × 10-4 at a temperature of ∼3.7 µK in ∼18.8 min. Our work paves the way for the intelligent preparation of degenerate quantum gas.