1. Text mining, a race against time? An attempt to quantify possible variations in text corpora of medical publications throughout the years.
- Author
-
Wagner M, Vicinus B, Muthra ST, Richards TA, Linder R, Frick VO, Groh A, Rubie C, and Weichert F
- Subjects
- Animals, Humans, Data Mining methods, Internet, Neoplasms, PubMed
- Abstract
Background: The continuous growth of medical sciences literature indicates the need for automated text analysis. Scientific writing which is neither unitary, transcending social situation nor defined by a timeless idea is subject to constant change as it develops in response to evolving knowledge, aims at different goals, and embodies different assumptions about nature and communication. The objective of this study was to evaluate whether publication dates should be considered when performing text mining., Methods: A search of PUBMED for combined references to chemokine identifiers and particular cancer related terms was conducted to detect changes over the past 36 years. Text analyses were performed using freeware available from the World Wide Web. TOEFL Scores of territories hosting institutional affiliations as well as various readability indices were investigated. Further assessment was conducted using Principal Component Analysis. Laboratory examination was performed to evaluate the quality of attempts to extract content from the examined linguistic features., Results: The PUBMED search yielded a total of 14,420 abstracts (3,190,219 words). The range of findings in laboratory experimentation were coherent with the variability of the results described in the analyzed body of literature. Increased concurrence of chemokine identifiers together with cancer related terms was found at the abstract and sentence level, whereas complexity of sentences remained fairly stable., Conclusions: The findings of the present study indicate that concurrent references to chemokines and cancer increased over time whereas text complexity remained stable., (Copyright © 2016 Elsevier Ltd. All rights reserved.)
- Published
- 2016
- Full Text
- View/download PDF