• DTr-CNN implements time series forecasting transfer learning across different datasets. • DTr-CNN alleviates the problem of lacking labeled target data in time series prediction. • Instead of only fine-tuning, DTr-CNN embeds the transfer phase into feature learning. • DTr-CNN incorporates DTW and JS divergence to evaluate similarity between datasets. • DTr-CNN takes advantages of CNN and applies it to forecasting problems. Due to the extensive practical value of time series prediction, many excellent algorithms have been proposed. Most of these methods are developed assuming that massive labeled training data are available. However, this assumption might be invalid in some actual situations. To address this limitation, a transfer learning framework with deep architectures is proposed. Since convolutional neural network (CNN) owns favorable feature extraction capability and can implement parallelization more easily, we propose a deep transfer learning method resorting to the architecture of CNN, termed as DTr-CNN for short. It can effectively alleviate the available labeled data absence and leverage useful knowledge to the current prediction. Notably, in our method, transfer learning process is implemented across different datasets. For a given target domain, in real-world scenarios, relativity of truly available potential source datasets may not be obvious, which is challenging and rarely referred to in most existing time series prediction methods. Aiming at this problem, the incorporation of Dynamic Time Warping (DTW) and Jensen-Shannon (JS) divergence is adopted for the selection of the appropriate source domain. Effectiveness of the proposed method is empirically underpinned by the experiments conducted on one group of synthetic and two groups of practical datasets. Besides, an additional experiment on NN5 dataset is conducted. [ABSTRACT FROM AUTHOR]