ElZemity, Adel, Arief, Budi, Li, Shujun (2025) Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data. In: Laborde, R., et al. Computer Security, ESORICS 2025 International Workshops. Lecture Notes in Computer Science Springer, Cham ISBN 978-3-032-16091-1. E-ISBN 978-3-032-16092-8. (doi:10.1007/978-3-032-16092-8_19) (KAR id:114620)
|
PDF
Author's Accepted Manuscript
Language: English |
|
|
Download this file (PDF/626kB) |
Preview |
| Request a format suitable for use with assistive technology e.g. a screenreader | |
| Official URL: https://doi.org/10.1007/978-3-032-16092-8_19 |
|
| Additional URLs: |
|
Abstract
Large language models (LLMs) have been used in many application domains, including cyber security. The application of LLMs in the cyber security domain presents significant opportunities, such as for enhancing threat analysis and malware detection, but it can also introduce critical risks and safety concerns, including potential personal data leakage and automated generation of new malware. Building on recent findings that fine-tuning LLMs with pseudo-malicious cyber security data significantly compromises their safety, this paper presents a comprehensive validation and extension of these safety risks using a different evaluation framework. We employ the garak red teaming framework with the OWASP Top 10 for LLM Applications to assess four open-source LLMs: Mistral 7B, Llama 3 8B, Gemma 2 9B, and DeepSeek R1 8B. Our evaluation confirms and extends previous findings, showing that fine-tuning reduces safety resilience across all tested LLMs (e.g., the failure rate of Mistral 7B against prompt injection increases from 9.1% to 68.7%). We further propose and evaluate a novel safety alignment approach that carefully rewords instruction-response pairs to include explicit safety precautions and ethical considerations. This work validates previous safety concerns through independent evaluation and introduces new methods for mitigating these risks, contributing towards the development of secure, trustworthy, and ethically aligned LLMs. This approach demonstrates that it is possible to maintain or even improve model safety while preserving technical utility, offering a practical path towards developing safer fine-tuning methodologies.
| Item Type: | Conference proceeding |
|---|---|
| DOI/Identification number: | 10.1007/978-3-032-16092-8_19 |
| Uncontrolled keywords: | Pseudo-Malicious, Large Language Models, Safety Alignment, Fine-Tuning, OWASP |
| Subjects: | Q Science > QA Mathematics (inc Computing science) |
| Institutional Unit: |
Schools > School of Computing Institutes > Institute of Cyber Security for Society |
| Former Institutional Unit: |
There are no former institutional units.
|
| Funders: | Engineering and Physical Sciences Research Council (https://ror.org/0439y7842) |
| Depositing User: | Budi Arief |
| Date Deposited: | 08 May 2026 15:51 UTC |
| Last Modified: | 08 May 2026 16:01 UTC |
| Resource URI: | https://kar.kent.ac.uk/id/eprint/114620 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):

https://orcid.org/0000-0002-5402-7837
Altmetric
Altmetric