Back to Search
Start Over
Dictionary buildup and stability of word frequency in a specialized medical area
- Source :
- American Documentation. 18:21-25
- Publication Year :
- 1967
- Publisher :
- Wiley, 1967.
-
Abstract
- This is a report of word usage in radiological (x-ray) patient records as found in a 5% sample of the annual case load at UAMC including 100,000 words. Records were taken exactly as dictated. The study is part of an effort to develop an IR system for patient data. The system “autocodes” (automatically stores) the physician's dictated findings and diagnoses in such a fashion that they can be retrieved again automatically. Some of our findings approximate results reported in the literature. For example, the rate of introduction of new different words levels off to about 2,500 words when 40,000 to 50,000 words of text have been analyzed. However, unclassified words continue to occur at a significant level of almost 2% at the 100,000 word level, with a 1% noise level. Attempts to establish the rank order of words beyond the first several hundred have failed because about 70% of the words appear to occur with such a low relative frequency (no more than one time in 10,000). Thus, establishing files by rank order appears impractical, even though filter lists (discard words) by rank groups (words with nearly the same relative frequency) are quite practical. Additional data are presented and design implications are discussed.
- Subjects :
- Polymers and Plastics
Computer science
business.industry
Sample (material)
Rank (computer programming)
Stability (learning theory)
computer.software_genre
Filter (higher-order function)
Frequency
Word lists by frequency
Word usage
Artificial intelligence
business
computer
Natural language processing
Word (group theory)
General Environmental Science
Subjects
Details
- ISSN :
- 19366108 and 0096946X
- Volume :
- 18
- Database :
- OpenAIRE
- Journal :
- American Documentation
- Accession number :
- edsair.doi...........821a9045bb633ffcbf8b8d45b562f85b
- Full Text :
- https://doi.org/10.1002/asi.5090180105