Back to Search
Start Over
Improving neural sentence alignment with word translation
- Source :
- Frontiers of Computer Science. 15
- Publication Year :
- 2020
- Publisher :
- Springer Science and Business Media LLC, 2020.
-
Abstract
- Sentence alignment is a basic task in natural language processing which aims to extract high-quality parallel sentences automatically. Motivated by the observation that aligned sentence pairs contain a larger number of aligned words than unaligned ones, we treat word translation as one of the most useful external knowledge. In this paper, we show how to explicitly integrate word translation into neural sentence alignment. Specifically, this paper proposes three cross-lingual encoders to incorporate word translation: 1) Mixed Encoder that learns words and their translation annotation vectors over sequences where words and their translations are mixed alternatively; 2) Factored Encoder that views word translations as features and encodes words and their translations by concatenating their embeddings; and 3) Gated Encoder that uses gate mechanism to selectively control the amount of word translations moving forward. Experimentation on NIST MT and Opensubtitles Chinese-English datasets on both non-monotonicity and monotonicity scenarios demonstrates that all the proposed encoders significantly improve sentence alignment performance.
- Subjects :
- General Computer Science
business.industry
Computer science
020207 software engineering
02 engineering and technology
Translation (geometry)
computer.software_genre
Theoretical Computer Science
Task (project management)
Annotation
0202 electrical engineering, electronic engineering, information engineering
NIST
020201 artificial intelligence & image processing
Artificial intelligence
Control (linguistics)
business
computer
Encoder
Natural language processing
Sentence
Word (computer architecture)
Subjects
Details
- ISSN :
- 20952236 and 20952228
- Volume :
- 15
- Database :
- OpenAIRE
- Journal :
- Frontiers of Computer Science
- Accession number :
- edsair.doi...........4a15f34f8375df349dc72de6fef90c1f
- Full Text :
- https://doi.org/10.1007/s11704-019-9164-3