1. Hahahahaha, Duuuuude, Yeeessss!: A two-parameter characterization of stretchable words and the dynamics of mistypings and misspellings
- Author
-
Tyler J. Gray, Peter Sheridan Dodds, and Christopher M. Danforth
- Subjects
FOS: Computer and information sciences ,Root (linguistics) ,Computer science ,Entropy ,Kernel Functions ,Social Sciences ,02 engineering and technology ,Sociology ,0202 electrical engineering, electronic engineering, information engineering ,Psychology ,Operator Theory ,Language ,Multidisciplinary ,Computer Science - Computation and Language ,Physics ,Social Communication ,Eukaryota ,06 humanities and the arts ,Spelling ,Linguistics ,Separation Processes ,Social Networks ,0602 languages and literature ,Physical Sciences ,Medicine ,Thermodynamics ,020201 artificial intelligence & image processing ,Written language ,Information Technology ,Computation and Language (cs.CL) ,Network Analysis ,Research Article ,Physics - Physics and Society ,Computer and Information Sciences ,Science ,Twitter ,FOS: Physical sciences ,Physics and Society (physics.soc-ph) ,Research and Analysis Methods ,Cnidaria ,Phonetics ,Entropy (information theory) ,Animals ,Humans ,Distillation ,Natural Language Processing ,060201 languages & linguistics ,Two parameter ,Cognitive Psychology ,Organisms ,Biology and Life Sciences ,Invertebrates ,Communications ,Reading ,Cognitive Science ,Jellyfish ,Social Media ,Mathematics ,Spoken language ,Neuroscience - Abstract
Stretched words like `heellllp' or `heyyyyy' are a regular feature of spoken language, often used to emphasize or exaggerate the underlying meaning of the root word. While stretched words are rarely found in formal written language and dictionaries, they are prevalent within social media. In this paper, we examine the frequency distributions of `stretchable words' found in roughly 100 billion tweets authored over an 8 year period. We introduce two central parameters, `balance' and `stretch', that capture their main characteristics, and explore their dynamics by creating visual tools we call `balance plots' and `spelling trees'. We discuss how the tools and methods we develop here could be used to study the statistical patterns of mistypings and misspellings, along with the potential applications in augmenting dictionaries, improving language processing, and in any area where sequence construction matters, such as genetics., Comment: 18 pages, 18 figures, and 9 tables. Online appendices at http://compstorylab.org/stretchablewords/
- Published
- 2019
- Full Text
- View/download PDF