Dose multimodal machine translation can improve translation performance?

Authors :: Cui, ShaoDong
Duan, Kaibo
Ma, Wen
Shinnou, Hiroyuki
Source :: Neural Computing & Applications. Aug2024, Vol. 36 Issue 22, p13853-13864. 12p.
Publication Year :: 2024
Abstract: Multimodal machine translation (MMT) is a method that uses visual information to guide text translation. However, recent studies have engendered controversy regarding the extent to which MMT can contribute to the improvement of text-enhanced translation. To explore whether the MMT model can improve translation performance, we use the current Neural Machine Translation (NMT) system for evaluation at Multi30K dataset. Specifically, we judge the performance of the MMT model by comparing the difference between the NMT model and the MMT model. At the same time, we conduct text and multimodal degradation experiments to verify whether vision can play a role. We explored the performance of the NMT model and the MMT model for sentences of different lengths to clarify the pros and cons of the MMT model. We found that the performance of the current NMT model surpasses that of the MMT model, suggesting that the impact of visual features might be less significant. Visual features seem to exert influence primarily when a substantial number of words in the source text are masked. [ABSTRACT FROM AUTHOR]