Back to Search Start Over

English WordNet Taxonomic Random Walk Pseudo-Corpora

Authors :
Klubicka, Filip
Maldonado, Alfredo
Mahalunkar, Abhijit
Kelleher, John D.
ADAPT Centre for Dig- ital Content Technology
SFI Research Centres Programme
Source :
Conference papers
Publication Year :
2020
Publisher :
Technological University Dublin, 2020.

Abstract

This is a resource description paper that describes the creation and properties of a set of pseudo-corpora generated artificially from a random walk over the English WordNet taxonomy. Our WordNet taxonomic random walk implementation allows the exploration of different random walk hyperparameters and the generation of a variety of different pseudo-corpora. We find that different combinations of the walk’s hyperparameters result in varying statistical properties of the generated pseudo-corpora. We have published a total of 81 pseudo-corpora that we have used in our previous research, but have not exhausted all possible combinations of hyperparameters, which is why we have also published a codebase that allows the generation of additional WordNet taxonomic pseudo-corpora as needed. Ultimately, such pseudo-corpora can be used to train taxonomic word embeddings, as a way of transferring taxonomic knowledge into a word embedding space.

Details

Language :
English
Database :
OpenAIRE
Journal :
Conference papers
Accession number :
edsair.doi.dedup.....73d0c899a954356cdb96c332b90164bf
Full Text :
https://doi.org/10.21427/qvgt-zn56