Deep Convolutional Neural Networks (DCNNs), due to their high computational and memory requirements, face significant challenges in deployment on resource-constrained devices. Network Pruning, an essential model compression technique, contributes to enabling the efficient deployment of DCNNs on such devices. Compared to traditional rule-based pruning methods, Reinforcement Learning(RL)-based automatic pruning often yields more effective pruning strategies through its ability to learn and adapt. However, the current research only set a single agent to explore the optimal pruning rate for all convolutional layers, ignoring the interactions and effects among multiple layers. To address this challenge, this paper proposes an automatic Filter Pruning method with a multi-agent reinforcement learning algorithm QMIX, named QMIX_FP. The multi-layer structure of DCNNs is modeled as a multi-agent system, which considers the varying sensitivity of each convolutional layer to the entire DCNN and the interactions among them. We employ the multi-agent reinforcement learning algorithm QMIX, where individual agent contributes to the system monotonically, to explore the optimal iterative pruning strategy for each convolutional layer. Furthermore, fine-tuning the pruned network using knowledge distillation accelerates model performance improvement. The efficiency of this method is demonstrated on two benchmark DCNNs, including VGG-16 and AlexNet, over CIFAR-10 and CIFAR-100 datasets. Extensive experiments under different scenarios show that QMIX_FP not only reduces the computational and memory requirements of the networks but also maintains their accuracy, making it a significant advancement in the field of model compression and efficient deployment of deep learning models on resource-constrained devices.
Keywords: Filter pruning; Knowledge distillation; QMIX algorithm.
© 2024. The Author(s).