Back to Search Start Over

The textual characteristics of traditional and Open Access scientific journals are similar

Authors :
Hunter Lawrence
Cohen K Bretonnel
Verspoor Karin
Source :
BMC Bioinformatics, Vol 10, Iss 1, p 183 (2009)
Publication Year :
2009
Publisher :
BMC, 2009.

Abstract

Abstract Background Recent years have seen an increased amount of natural language processing (NLP) work on full text biomedical journal publications. Much of this work is done with Open Access journal articles. Such work assumes that Open Access articles are representative of biomedical publications in general and that methods developed for analysis of Open Access full text publications will generalize to the biomedical literature as a whole. If this assumption is wrong, the cost to the community will be large, including not just wasted resources, but also flawed science. This paper examines that assumption. Results We collected two sets of documents, one consisting only of Open Access publications and the other consisting only of traditional journal publications. We examined them for differences in surface linguistic structures that have obvious consequences for the ease or difficulty of natural language processing and for differences in semantic content as reflected in lexical items. Regarding surface linguistic structures, we examined the incidence of conjunctions, negation, passives, and pronominal anaphora, and found that the two collections did not differ. We also examined the distribution of sentence lengths and found that both collections were characterized by the same mode. Regarding lexical items, we found that the Kullback-Leibler divergence between the two collections was low, and was lower than the divergence between either collection and a reference corpus. Where small differences did exist, log likelihood analysis showed that they were primarily in the area of formatting and in specific named entities. Conclusion We did not find structural or semantic differences between the Open Access and traditional journal collections.

Details

Language :
English
ISSN :
14712105
Volume :
10
Issue :
1
Database :
Directory of Open Access Journals
Journal :
BMC Bioinformatics
Publication Type :
Academic Journal
Accession number :
edsdoj.f2dfea980d5b43de945b988e016fb829
Document Type :
article
Full Text :
https://doi.org/10.1186/1471-2105-10-183