Moths are pests that pose a major threat to food production in China, and the monitoring and prevention of moth infestation is of great significance. To address the problems of a high diversity of moths with minor differences and difficult identification, a semantic segmentation network based on depthwise separable convolution, attention mechanism, pyramid pooling depthwise squeeze-and excitation pyramid network (DSEPNet)—was proposed. The network to extract texture features and wing edge information of moths was enhanced based on the optimization of the model of channel attention mechanism on UNet. The computational speed of the model was increased and the number of parameters of the model was reduced based on the improvement in depthwise separable convolution. A pyramid pooling module was added between the encoder and decoder so that the model could input images of an arbitrary size, while enhancing its ability to learn feature information of different dimensions. DSEPNet was evaluated by ablation and contrast experiments. Compared with UNet, the accuracy, mean intersection over union (mIoU), and F1-Score of DSEPNet were improved by 2.04%, 9.14%, and 4.08%, respectively. Based on the moth dataset, compared with R2AU-Net, the mIoU of DSEPNet was improved by 3.04%. To verify the generalization of the model, comparison experiments were done on the Pascal VOC 2012 dataset. The mIoU of DSEPNet was improved by 0.51% compared with PSPNet and by 0.18% compared with DeepLabv3. Meanwhile, an automatic annotation algorithm for data sets was proposed to solve the time-consuming and laborious process of manual annotation, which can automatically generate semantic segmentation annotation files. DSEPNet can be installed on the trap to identify moths in real time and improve the identification accuracy.