Back to Search Start Over

Dictionary buildup and stability of word frequency in a specialized medical area

Authors :
John M. Long
Gertrude C. Levy
Howard J. Barnhard
Source :
American Documentation. 18:21-25
Publication Year :
1967
Publisher :
Wiley, 1967.

Abstract

This is a report of word usage in radiological (x-ray) patient records as found in a 5% sample of the annual case load at UAMC including 100,000 words. Records were taken exactly as dictated. The study is part of an effort to develop an IR system for patient data. The system “autocodes” (automatically stores) the physician's dictated findings and diagnoses in such a fashion that they can be retrieved again automatically. Some of our findings approximate results reported in the literature. For example, the rate of introduction of new different words levels off to about 2,500 words when 40,000 to 50,000 words of text have been analyzed. However, unclassified words continue to occur at a significant level of almost 2% at the 100,000 word level, with a 1% noise level. Attempts to establish the rank order of words beyond the first several hundred have failed because about 70% of the words appear to occur with such a low relative frequency (no more than one time in 10,000). Thus, establishing files by rank order appears impractical, even though filter lists (discard words) by rank groups (words with nearly the same relative frequency) are quite practical. Additional data are presented and design implications are discussed.

Details

ISSN :
19366108 and 0096946X
Volume :
18
Database :
OpenAIRE
Journal :
American Documentation
Accession number :
edsair.doi...........821a9045bb633ffcbf8b8d45b562f85b
Full Text :
https://doi.org/10.1002/asi.5090180105