Back to Search
Start Over
What Do Language Representations Really Represent?
- Source :
- Computational Linguistics, Bjerva, J, Östling, R, Veiga, M H, Tiedemann, J & Augenstein, I 2019, ' What do language representations really represent? ', Computational Linguistics, vol. 45, no. 2, pp. 381-389 . https://doi.org/10.1162/COLIa00351, Computational Linguistics, Vol 45, Iss 2, Pp 381-389 (2019)
- Publication Year :
- 2019
-
Abstract
- A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words end up with similar representations. If the corpus is multilingual, the same model can be used to learn distributed representations of languages, such that similar languages end up with similar representations. We show that this holds even when the multilingual corpus has been translated into English, by picking up the faint signal left by the source languages. However, just like it is a thorny problem to separate semantic from syntactic similarity in word representations, it is not obvious what type of similarity is captured by language representations. We investigate correlations and causal relationships between language representations learned from translations on one hand, and genetic, geographical, and several levels of structural similarity between languages on the other. Of these, structural similarity is found to correlate most strongly with language representation similarity, while genetic relationships---a convenient benchmark used for evaluation in previous work---appears to be a confounding factor. Apart from implications about translation effects, we see this more generally as a case where NLP and linguistic typology can interact and benefit one another.<br />8 pages, accepted for publication in Computational Linguistics (squib)
- Subjects :
- Text corpus
FOS: Computer and information sciences
Linguistics and Language
Computer science
530 Physics
1702 Artificial Intelligence
02 engineering and technology
computer.software_genre
Language and Linguistics
Language Technology (Computational Linguistics)
computational linguistics
03 medical and health sciences
representation learning
0302 clinical medicine
Artificial Intelligence
0202 electrical engineering, electronic engineering, information engineering
1706 Computer Science Applications
6121 Languages
natural language processing
Språkteknologi (språkvetenskaplig databehandling)
1203 Language and Linguistics
Computer Science - Computation and Language
linguistic typology
business.industry
lcsh:P98-98.5
113 Computer and information sciences
16. Peace & justice
Computer Science Applications
3310 Linguistics and Language
10231 Institute for Computational Science
Language technology
030221 ophthalmology & optometry
language technology
020201 artificial intelligence & image processing
Artificial intelligence
Language model
lcsh:Computational linguistics. Natural language processing
Computational linguistics
business
language representations
computer
Computation and Language (cs.CL)
Natural language processing
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Computational Linguistics, Bjerva, J, Östling, R, Veiga, M H, Tiedemann, J & Augenstein, I 2019, ' What do language representations really represent? ', Computational Linguistics, vol. 45, no. 2, pp. 381-389 . https://doi.org/10.1162/COLIa00351, Computational Linguistics, Vol 45, Iss 2, Pp 381-389 (2019)
- Accession number :
- edsair.doi.dedup.....9f0b999299520ff505ce015e66270f1a
- Full Text :
- https://doi.org/10.5167/uzh-185185