Back to Search Start Over

WordPPR: A Researcher-Driven Computational Keyword Selection Method for Text Data Retrieval from Digital Media.

Authors :
Zhang, Yini
Chen, Fan
Suk, Jiyoun
Yue, Zhiying
Source :
Communication Methods & Measures; Oct-Dec2024, Vol. 18 Issue 4, p332-348, 17p
Publication Year :
2024

Abstract

Despite the increasing use of digital media data in communication research, a central challenge persists – retrieving data with maximal accuracy and coverage. Our investigation of keyword-based data collection practices in extant communication research reveals a one-step process, whereas our cross-disciplinary literature review suggests an iterative query expansion process guided by human knowledge and computer intelligence. Hence, we introduce the WordPPR method for keyword selection and text data retrieval, which entails four steps: 1) collecting an initial dataset using core/seed keyword(s); 2) constructing a word graph based on the dataset; 3) applying the Personalized PageRank (PPR) algorithm to rank words in proximity to the seed keyword(s) and selecting new keywords that optimize retrieval precision and recall; 4) repeating steps 1–3 to determine if additional data collection is needed. Without requiring corpus-wide sampling/analysis or extensive manual annotation, this method is well suited for data collection from large-scale digital media corpora. Our simulation studies demonstrate its robustness against parameter choice and its improvement upon other methods in suggesting additional keywords. Its application in Twitter data retrieval is also provided. By advancing a more systematic approach to text data retrieval, this study contributes to improving digital media data retrieval practices in communication research and beyond. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
19312458
Volume :
18
Issue :
4
Database :
Complementary Index
Journal :
Communication Methods & Measures
Publication Type :
Academic Journal
Accession number :
180919673
Full Text :
https://doi.org/10.1080/19312458.2023.2278177