NLP and Deep Learning for Phishing and Social Engineering Detection: A Systematic Review (2018–2026)
DOI:
https://doi.org/10.63956/jitar.v1i2.39Keywords:
phishing, social engineering, natural language processing, deep learning, transformer, systematic literature reviewAbstract
Phishing and social engineering continue to escalate as digital public services and online commerce expand, with attackers exploiting linguistic deception, impersonation cues, and routine “click-and-comply” behavior across email, SMS, and voice channels. Objective: This study aims to systematically synthesize research on phishing and social engineering detection using natural language processing (NLP) and deep learning (2018–2026) to address fragmented evidence across channels and inconsistent terminology that limits robust comparison and practical translation. Method: A systematic literature review was conducted through structured database searches and snowballing, followed by deduplication, staged screening, and eligibility assessment. Studies were analyzed using a standardized extraction form, then synthesized via descriptive mapping and thematic analysis to develop a method taxonomy and examine evaluation rigor and operational readiness. Findings: The evidence base is dominated by email/BEC detection, while smishing and vishing remain comparatively underrepresented. Methods increasingly rely on contextual language representations and hybrid architectures to capture semantic and local deception patterns; however, evaluation practices are heterogeneous and often provide limited evidence on cross-dataset generalization, temporal robustness, and deploy ability. Socio-technical findings also indicate that human susceptibility and system/client workflow vulnerabilities can moderate the real-world effectiveness of technical defenses. Implications: The proposed taxonomy supports method selection by channel and highlights actionable priorities for practice and policy, including standardized reporting, cross-dataset and temporal validation, robustness testing, and integration with operational security workflows. Originality: This review adds value by consolidating detection and deception-centric strands through explicit inclusion of impersonation, fraud email, and scam terminology, and by linking methodological choices to evaluation rigor and deployment constraints across email, SMS, and voice contexts.
References
Alguliyev, R., Aliguliyev, R., & Sukhostat, L. (2024). An Approach for Business Email Compromise Detection using NLP and Deep Learning. 18th IEEE International Conference on Application of Information and Communication Technologies, AICT 2024. https://doi.org/10.1109/AICT61888.2024.10740431
Bagui, S., Nandi, D., Bagui, S., & White, R. J. (2021). Machine Learning and Deep Learning for Phishing Email Classification using One-Hot Encoding. Journal of Computer Science, 17(7), 610–623. https://doi.org/10.3844/jcssp.2021.610.623
Bera, D., Ogbanufe, O., & Kim, D. J. (2023). Towards a thematic dimensional framework of online fraud: An exploration of fraudulent email attack tactics and intentions. Decision Support Systems, 171. https://doi.org/10.1016/j.dss.2023.113977
De Queiroz, H. J. D. S. (2025). Phishing and social engineering attack prevention with LLMs. In Revolutionizing Cybersecurity With Deep Learning and Large Language Models (pp. 133–163). https://doi.org/10.4018/979-8-3373-3296-3.ch005
Ferreira, A., Coventry, L., & Lenzini, G. (2015). Principles of persuasion in social engineering and their use in phishing. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9190, 36–47. https://doi.org/10.1007/978-3-319-20376-8_4
Gupta, A., Mishra, A. K., & Arora, K. (2025). Detecting Phishing Emails Using Natural Language Processing. 2025 International Conference on Pervasive Computational Technologies, ICPCT 2025, 234–238. https://doi.org/10.1109/ICPCT64145.2025.10941056
Hilani, M., Nassih, B., Lmati, I., Balouki, Y., & Amine, A. (2025). Phishing Email Detection Using NLP and CNN Model. Lecture Notes in Networks and Systems, 1486 LNNS, 203–212. https://doi.org/10.1007/978-3-031-95330-9_22
Iwara, I. O. (2025). Law Enforcement Impersonation Bank-Related Scams in South Africa: Perceived Vulnerability and Mitigative Strategies. Risks, 13(8). https://doi.org/10.3390/risks13080156
Karhani, H. E., Jamal, R. A., Samra, Y. B., Elhajj, I. H., & Kayssi, A. (2023). Phishing and Smishing Detection Using Machine Learning. Proceedings of the 2023 IEEE International Conference on Cyber Security and Resilience, CSR 2023, 206–211. https://doi.org/10.1109/CSR57506.2023.10224954
Mahmud, A. F., & Wirawan, S. (2024). Deteksi Phishing Website Menggunakan Machine Learning Metode Klasifikasi. Sistemasi: Jurnal Sistem Informasi, 13(4), 1368–1380. https://doi.org/10.32520/stmsi.v13i4.3456
Palatty, N. J. (2026). 81 Phishing Attack Statistics 2026: The Ultimate Insight. In Astra Security Blog. Astra Security. https://www.getastra.com/blog/security-audit/phishing-attack-statistics/
PANDI, P. N. D. I. I. (2022). IDADX Terima 34.622 Laporan Kejahatan Phishing dalam 5 Tahun. In PANDI Press Release. Pengelola Nama Domain Internet Indonesia (PANDI). https://pandi.id/en/siaran-pers/idadx-terima-34-622-laporan-kejahatan-phishing-dalam-5-tahun
Perbendaharaan, D. J. (2025). Phishing: Pengertian, Jenis, dan Cara Menghindari Phising. Direktorat Jenderal Perbendaharaan, Kementerian Keuangan Republik Indonesia. https://djpb.kemenkeu.go.id/kppn/manna/id/data-publikasi/artikel/3239-keamanan-informasi-phishing-pengertian,-jenis,-dan-cara-menghindari-phising.html
Phang, Z. H., Tan, W. M., Xiong Choo, J. S., Ong, Z. K., Isaac Tan, W. H., & Guo, H. (2024). VishGuard: Defending Against Vishing. Proceedings of the 8th Cyber Security in Networking Conference: AI for Cybersecurity, CSNet 2024, 108–115. https://doi.org/10.1109/CSNet64211.2024.10851764
Pimpason, N., Viboonsang, P., & Kosolsombat, S. (2025). Phishing Email Detection Model Using Deep Learning. International Conference on Cybernetics and Innovations, ICCI 2025. https://doi.org/10.1109/ICCI64209.2025.10987422
Sarno, D. M., & Black, J. (2024). Who Gets Caught in the Web of Lies?: Understanding Susceptibility to Phishing Emails, Fake News Headlines, and Scam Text Messages. Human Factors, 66(6), 1742–1753. https://doi.org/10.1177/00187208231173263
Sommestad, T., & Karlzén, H. (2024). The unpredictability of phishing susceptibility: results from a repeated measures experiment. Journal of Cybersecurity, 10(1). https://doi.org/10.1093/cybsec/tyae021
Topor, L., & Pollack, M. (2022). Fake Identities in Social Cyberspace: From Escapism to Terrorism. International Journal of Cyber Warfare and Terrorism, 12(1). https://doi.org/10.4018/IJCWT.295867
Veit, M. F., Wiese, O., Ballreich, F. L., Volkamer, M., Engels, D., & Mayer, P. (2025). SoK: The past decade of user deception in emails and today’s email clients’ susceptibility to phishing techniques. Computers and Security, 150. https://doi.org/10.1016/j.cose.2024.104197
Vidyasri, P., & Suresh, S. (2025). FDN-SA: Fuzzy deep neural-stacked autoencoder-based phishing attack detection in social engineering. Computers and Security, 148. https://doi.org/10.1016/j.cose.2024.104188
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Ovan Sunarto Pulu, Muhammad Fadly

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
