This study adopted multiscale convolutional neural networks (MCNNs) to increase the volume of feature extraction, and a long short-term memory (LSTM) network was employed to rank time-series signals. Three simulated marine environments were examined, namely the Kinmen Sea, the southwestern sea, and the eastern sea areas of Taiwan. In addition, signal transmission and reception experiments were performed in a small fishpond and a swimming pool. To eliminate the intersymbol interference (ISI) induced by multipath interference, a virtual time reversal mirror (VTRM) was used to optimize the data collected from the simulated underwater environments. The results of this study reveal that after the data collected from the southwestern sea area were optimized and used for training, the resulting models could be used to demodulate the data collected from the Kinmen and eastern sea areas. In the simulations, when the signal-to-noise ratio (SNR) in the Kinmen Sea area was −14 and 14 dB, the bit error rate (BER) of the aforementioned model was 0.00145 and 0.00019, respectively. In the experiments, when the MCNN–LSTM model was trained using fishpond data under a transmit power of 0.003 and 0.01 W, the BER of the model was 0.000083 and 0.000025, respectively. When this model was trained using swimming pool data under a transmit power of 0.003 and 0.01 W, the BER was 0.000008 and 0.000004, respectively. The simulation and experimental results indicated that when model training and testing were performed using data collected from the same sea area, the MCNN–LSTM model was more accurate than the convolutional neural network (CNN)–LSTM model.