Back to Search Start Over

New Observations on Zipf’s Law in Passwords.

Authors :
Hou, Zhenduo
Wang, Ding
Source :
IEEE Transactions on Information Forensics & Security; 2023, Vol. 18, p517-532, 16p
Publication Year :
2023

Abstract

As password distribution lays the foundation for various password research, accurately characterizing it receives considerable attention. At IEEE TIFS’17, Wang et al. proposed the CDF-Zipf distribution model with the golden-section-search (GSS) fitting method to find the optimal parameters. Their model has been adopted by over 120 password-related studies. In this paper, we address their remaining, fundamental goodness-of-fit issue of password distribution in a principled approach. First, we prove that the confidence level of the state-of-the-art Monte Carlo approach (MCA, for the goodness-of-fit test) converges asymptotically to 0. By experimenting on 228.92 million real-world passwords, we confirm Wang et al.’s conjecture on the effect of sample size that minor deviations would lead to statistical significance for large-scale datasets. We propose both absolute and relative deviation metrics, and find that 1% random deviations in both metrics suffice to reject CDF-Zipf. Second, we attempt to reduce the non-negligible gap between the empirical and fitted distributions (with the maximum deviation of cumulative distribution function (CDF) being 1.91% on average). We explore eight alternative distribution models in two coordinate systems, and find that three models are more accurate than CDF-Zipf, but none can pass MCA. Particularly, we reveal that stretched-exponential, a variant of CDF-Zipf, can on average reduce the maximum CDF deviation from 1.91% to 1.25%. Third, to replace MCA, we introduce a new goodness-of-fit measure based on log-likelihoods. We find that stretched-exponential constantly has a larger log-likelihood than its counterparts. In all, stretched-exponential fits passwords better and further supports Zipf’s law in passwords. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15566013
Volume :
18
Database :
Complementary Index
Journal :
IEEE Transactions on Information Forensics & Security
Publication Type :
Academic Journal
Accession number :
160906331
Full Text :
https://doi.org/10.1109/TIFS.2022.3176185