PassTSL: Modeling Human-Created Passwords Through Two-Stage Learning

Li, Haozhang, Wang, Yangde, Qiu, Weidong, Li, Shujun, Tang, Peng (2024) PassTSL: Modeling Human-Created Passwords Through Two-Stage Learning. In: Information Security and Privacy: 29th Australasian Conference, ACISP 2024, Sydney, NSW, Australia, July 15–17, 2024, Proceedings, Part III. Lecture Notes in Computer Science . pp. 404-423. Springer, Singapore ISBN 978-981-97-5100-6. E-ISBN 978-981-97-5101-3. (doi:10.1007/978-981-97-5101-3_22) (KAR id:108760)

PDF Author's Accepted Manuscript Language: English
Download this file (PDF/2MB)	Preview
Request a format suitable for use with assistive technology e.g. a screenreader
Official URL: https://doi.org/10.1007/978-981-97-5101-3_22
Additional URLs: Author Author

Abstract

Textual passwords are still the most widely used user authentication mechanism. Due to the close connections between textual passwords and natural languages, advanced technologies in natural language processing (NLP) and machine learning (ML) could be used to model passwords for different purposes such as studying human password-creation behaviors and developing more advanced password cracking methods for informing better defence mechanisms. In this paper, we propose PassTSL (modeling human-created Passwords through Two-Stage Learning), inspired by the popular pretraining-finetuning framework in NLP and deep learning (DL). We report how different pretraining settings affected PassTSL and proved its effectiveness by applying it to six large leaked password databases. Experimental results showed that it outperforms five state-of-the-art (SOTA) password cracking methods on password guessing by a significant margin ranging from 4.11% to 64.69% at the maximum point. Based on PassTSL, we also implemented a password strength meter (PSM), and our experiments showed that it was able to estimate password strength more accurately, causing fewer unsafe errors (overestimating the password strength) than two other SOTA PSMs when they produce the same rate of safe errors (underestimating the password strength): a neural-network based method and zxcvbn. Furthermore, we explored multiple finetuning settings, and our evaluations showed that, even a small amount of additional training data, e.g., only 0.1% of the pretrained data, can lead to over 3% improvement in password guessing on average. We also proposed a heuristic approach to selecting finetuning passwords based on JS (Jensen-Shannon) divergence and experimental results validated its usefulness. In summary, our contributions demonstrate the potential and feasibility of applying advanced NLP and ML methods to password modeling and cracking.

Item Type:	Conference or workshop item (Paper)
DOI/Identification number:	10.1007/978-981-97-5101-3_22
Uncontrolled keywords:	Password modeling, Natural language processing, Machine learning, Password strength meters
Subjects:	Q Science > QA Mathematics (inc Computing science) > QA 75 Electronic computers. Computer science Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming, > QA76.76.E95 Expert Systems (Intelligent Knowledge Based Systems) Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming, > QA76.87 Neural computers, neural networks T Technology > TK Electrical engineering. Electronics. Nuclear engineering > TK7800 Electronics > TK7880 Applications of electronics > TK7882.P3 Pattern recognition systems
Institutional Unit:	Schools > School of Computing Institutes > Institute of Cyber Security for Society
Former Institutional Unit:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing University-wide institutes > Institute of Cyber Security for Society
Funders:	University of Kent (https://ror.org/00xkeyj56)
Depositing User:	Shujun Li
Date Deposited:	15 Feb 2025 18:47 UTC
Last Modified:	22 Jul 2025 09:22 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/108760 (The current URI for this page, for reference purposes)

University of Kent Author Information

Li, Shujun.

Creator's ORCID:	https://orcid.org/0000-0001-5628-7328
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.