• An automatic sleep stage scoring method via data augmentation, ensemble convolution neural network, and expert knowledge was proposed. • A decision-visualization method, name Grad-CAM, was applied to help clinical staffs or doctors understand the decision-making of CNN. • The experimental results showed that the averaged accuracy, kappa coefficient, and F1 score of the proposed method were 93.78%, 91.18%, and 88.93%, respectively. Scoring sleep data is a subjective and time-consuming. It takes more than one hour to score a whole night's PSG data. The automatic sleep stage scoring systems is needed to reduce clinical manpower. In this paper, an automatic sleep stage scoring combining the techniques of data augmentation, ensemble convolutional neural network (CNN), and expert knowledge was proposed. Also, we used a decision-visualization method, named Grad-CAM, to help clinical staffs or doctors understand the decision-making of CNN. All-night sleep physiological signals from 19 healthy individuals and 23 insomnia patients were used. First, the all-night electroencephalogram was segmented into 30-sec segments. Subsequently, each segment was transformed into spectrograms by continuous wavelet transform and a simple data augmentation was applied to increase the various of the training data. Next, the spectrograms were utilized as an input of our proposed CNN, named Spectrogram Net (SNet) and other 12 famous CNNs for training. The top three trained CNN models with high accuracy were used to form an ensemble model. After classified by the ensemble CNN, smoothing rules which is an expert knowledge-based post processing was used to modify unreasonable sleep transition. To validate the robustness of the proposed method, 2-fold cross-validation was used. The experimental results showed that the averaged accuracy, kappa coefficient, and F1 score of the proposed method were 93.78%, 91.18%, and 88.93%, respectively. The results proved that the proposed method via data augmentation, ensemble CNN, and expert knowledge had highly accuracy for spectrogram-based image classification. [ABSTRACT FROM AUTHOR]