Back to Search
Start Over
New Observations on Zipf’s Law in Passwords.
- Source :
- IEEE Transactions on Information Forensics & Security; 2023, Vol. 18, p517-532, 16p
- Publication Year :
- 2023
-
Abstract
- As password distribution lays the foundation for various password research, accurately characterizing it receives considerable attention. At IEEE TIFS’17, Wang et al. proposed the CDF-Zipf distribution model with the golden-section-search (GSS) fitting method to find the optimal parameters. Their model has been adopted by over 120 password-related studies. In this paper, we address their remaining, fundamental goodness-of-fit issue of password distribution in a principled approach. First, we prove that the confidence level of the state-of-the-art Monte Carlo approach (MCA, for the goodness-of-fit test) converges asymptotically to 0. By experimenting on 228.92 million real-world passwords, we confirm Wang et al.’s conjecture on the effect of sample size that minor deviations would lead to statistical significance for large-scale datasets. We propose both absolute and relative deviation metrics, and find that 1% random deviations in both metrics suffice to reject CDF-Zipf. Second, we attempt to reduce the non-negligible gap between the empirical and fitted distributions (with the maximum deviation of cumulative distribution function (CDF) being 1.91% on average). We explore eight alternative distribution models in two coordinate systems, and find that three models are more accurate than CDF-Zipf, but none can pass MCA. Particularly, we reveal that stretched-exponential, a variant of CDF-Zipf, can on average reduce the maximum CDF deviation from 1.91% to 1.25%. Third, to replace MCA, we introduce a new goodness-of-fit measure based on log-likelihoods. We find that stretched-exponential constantly has a larger log-likelihood than its counterparts. In all, stretched-exponential fits passwords better and further supports Zipf’s law in passwords. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 15566013
- Volume :
- 18
- Database :
- Complementary Index
- Journal :
- IEEE Transactions on Information Forensics & Security
- Publication Type :
- Academic Journal
- Accession number :
- 160906331
- Full Text :
- https://doi.org/10.1109/TIFS.2022.3176185