As an important cornerstone of the fourth industrial revolution, the industrial Internet transforms isolated industrial systems into connected networks and is an important expansion direction for digital industrialization. Anomaly detection in the industrial Internet of things environment is of great significance for automated decision-making. To address the problem that most existing methods fail to effectively consider the complex unknown topological relationships between sensors and the multi-scale patterns inherent in the industrial IoT temporal data, an unsupervised anomaly detection method MSTSAD with multi-scale spatiotemporal feature fusion for industrial IoT temporal data is proposed, which firstly constructs a novel bi-directional spatiotemporal feature extraction module to sequentially captures the correlation and bidirec- tional dependency between multiple time series. Secondly, a multi- scale gated temporal convolutional neural network is designed to adaptively extract multi-scale temporal features of time series, and a dual affine projection is introduced to realize the cross- fusion of multi-scale temporal features and spatiotemporal features to enhance the feature extraction of the model on the original data. Finally, a variational self-encoder combined with adversarial training is proposed to amplify the reconstruction error of anomalies and enhance the anti-interference ability of the model to train data noise, which enhances the ability of model to discriminate anomalous data. Experiments are conducted on four publicly available datasets, GPW, ECG5000, Occupancy and SWaT, in comparison with five state-of-the-art methods, and the experimental results show that F1 scores of MSTSAD are enhanced by 0.015-0.047. [ABSTRACT FROM AUTHOR]