Back to Search Start Over

Citationwalk: Network representation learning with scientific documents.

Authors :
Lee, Juhyun
Park, Sangsung
Lee, Junseok
Source :
Expert Systems with Applications. Oct2023, Vol. 227, pN.PAG-N.PAG. 1p.
Publication Year :
2023

Abstract

• The feature representation of the network was extracted from a randomly searched sequence of vertices. • In networks of scientific documents, representation learning needs to consider similarity and citation relationships. • This paper proposed a method to search and learn multiple paths connecting similar documents in a scientific document network. • From scientific papers and patent datasets, we found empirical evidence that our algorithm is suitable for representation learning. • This paper can contribute to preparing the basis for inducing innovation in science and technology from scientific documents. A network is a structure that can represent an organic relationship of observations. Network representation learning has the advantage of extracting latent features in a network. In recent years, various algorithms have been developed to embed network representation in a low-dimensional feature space. Conventional algorithms learn a sequence of vertices that are randomly searched from a network and convert a vertex into a vector. This study focused on the representation learning of document networks. Thus, this study proposes a scalable network representation learning algorithm that preserves the structure and content of the document network and reduces data sparsity. To do this, we used the shortest walks to extract a sequence of vertices that reflected semantic similarity from a network. The shortest walk is a method for finding several shortest paths that connect similar vertices. After this, the sequence of vertices was inputted into the language model, assuming that the sequence was a sentence. Scientific papers and patent datasets provided empirical evidence for the validity of our approach. The former, the bibliography network of scientific paper, consists of 17,716 nodes and 105,734 edges. The latter is a patent describing a technology for energy, representing a network of 24,894 nodes and 28,314 edges. The results showed that the proposed algorithm exhibited the lowest error of the tested methods in multi-class classification and link prediction tasks. The results of these tasks provided evidence that our algorithm is suitable for extracting feature representations of document networks. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09574174
Volume :
227
Database :
Academic Search Index
Journal :
Expert Systems with Applications
Publication Type :
Academic Journal
Accession number :
164111250
Full Text :
https://doi.org/10.1016/j.eswa.2023.120372