Explainable AI Approaches for Detecting and Mitigating Phishing Attacks: A Review

Thalpage, N.; Jayarathne, E.

Explainable AI Approaches for Detecting and Mitigating Phishing Attacks: A Review

Nipuna Sankalpa Thalpage, Eranga Jayarathne

Cite: Thalpage, N.; Jayarathne, E. Explainable AI Approaches for Detecting and Mitigating Phishing Attacks: A Review. JDS, 7(2), (2025). https://doi.org/10.33847/2686-8296.7.2_4

Download PDF

Abstract. Phishing remains one of the most pervasive and sophisticated cybersecurity threats, increasingly leveraging social engineering, AI-driven content generation, and multi-vector delivery methods. While machine learning (ML) and deep learning (DL) models have significantly advanced phishing detection capabilities, their “black-box” nature often limits transparency, trust, and practical adoption in real-world security environments. Explainable Artificial Intelligence (XAI) offers a solution by providing interpretable insights into model decisions, enabling analysts and stakeholders to understand, validate, and act upon automated classifications. This semi-systematic review examines contemporary XAI techniques applied to phishing detection, focusing on studies published between 2017 and 2025. Searches conducted across Scopus, IEEE Xplore, and Google Scholar yielded peer-reviewed literature integrating explainability into ML/DL-based phishing detection. The selected studies were synthesized to identify the types of models used, the XAI methods employed, and their contributions to interpretability, operational value, and human–AI collaboration. Findings show that feature attribution methods such as SHAP, LIME, and Integrated Gradients are the most widely adopted, offering both global and local explanations for text-based and URL-based phishing detection. Attention mechanisms and visualization techniques further enhance transparency in deep learning models, while interpretable models—such as decision trees and logistic regression, remain valuable for contexts requiring high clarity. However, gaps persist in real-world validation, dataset diversity, standard metrics for evaluating explanations, and deployment feasibility. Overall, XAI strengthens phishing mitigation by improving user trust, supporting analyst decision-making, and enabling more accountable AI-driven security systems. The review highlights the need for scalable, human-centred, and adversarially robust XAI approaches to support the next generation of phishing detection frameworks.

References

N.Alsuqayh, A. Mirza and A. Alhogail, “Exploring feature engineering and explainable AI for phishing website detection: a systematic literature review,” International Journal of Electrical and Computer Engineering (IJECE), pp. DOI: 10.11591/ijece.v15i6.pp5863-5878, 2025.
P.R. Chandre, P. Bhujbal, A. Jadhav, B. D. Shendkar, A. Wangikar and R. Sachdeo, “A comprehensive review of interpretable machine learning techniques for phishing attack detection,” IAES International Journal of Artificial Intelligence, pp. DOI: 10.11591/ijai.v14.i4.pp3022-3032, 2025.
M. Tawfik, A. A. Abu-Ein, A. Abdelhaliem and S. FathiIslam, “Explainable few-shot learning with modern BERT for detecting emerging phishing attacks using XF PhishBERT: Explainable few-shot learning with modern BERT…M. Tawfik et al.,” Scientific Reports, pp. DOI: 10.1038/s41598-025-27500-0, 2025.
Thalpage, N.. The Integration of Machine Learning and Explainable AI in Business Digitization: Unleashing the Power of Data -A Review. Journal of Digital Science, 6(1), 2023. https://doi.org/10.33847/2686-8296.6.1_2.
F. Doshi-Velez and B. Kim, “Towards A Rigorous Science of Interpretable Machine Learning,” Computer Science, Philosophy, 2017. https://api.semanticscholar.org/CorpusID:11319376.
N. S. Thalpage and T. A. D. Nisansala, “Exploring the Opportunities of Applying Digital Twins for Intrusion Detection in Industrial Control Systems of Production and Manufacturing – A Systematic Review,” Data Protection in a Post-Pandemic Society, pp. https://doi.org/10.1007/978-3-031-34006-2_4, 2023.
M. Mehdi, Y. Farzaneh, S. Farzaneh, S. Elham, S. Elham and H. Gharaee, “An Adaptive Machine Learning Based Approach for Phishing Detection Using Hybrid Features,” in Conference: 2019 5th International Conference on Web Research (ICWR), 2019. DOI: 10.1109/ICWR.2019.8765265
S. Baki and R. Verma, “Sixteen Years of Phishing User Studies: What Have We Learned?,” IEEE Transactions on Dependable and Secure Computing, 2022, DOI: 10.1109/TDSC.2022.3151103
O. K. Sahingoz, E. Buber, O. Demir and B. Diri, “Machine learning based phishing detection from URLs,” Expert Systems with Applications, 2019, https://doi.org/10.1016/j.eswa.2018.09.029. 
R. Basnet, S. Mukkamala and A. H. Sung, “Detection of Phishing Attacks: A Machine Learning Approach,” Studies in Fuzziness and Soft Computing, pp. DOI: 10.1007/978-3-540-77465-5_19, 2008.
M. Adebowale and K. Lwin, “Deep Learning with Convolutional Neural Network and Long Short-Term Memory for Phishing Detection,” in Deep Learning with Convolutional Neural Network and Long Short-Term Memory for Phishing Detection, 2019, DOI: 10.1109/SKIMA47702.2019.898242.
U. Bhatt, A. Xiang and P. Eckersley, “Explainable machine learning in deployment,” in Computer Science Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, https://doi.org/10.1145/3351095.3375624 .
S. Lundberg and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions,” in NIPS, 2017, DOI: 10.48550/arXiv.1705.07874.
M. T. Ribeiro, S. Singh and C. Guestrin, “”Why Should I Trust You?”: Explaining the Predictions of Any Classifier,” in the 22nd ACM SIGKDD International Conference , 2016, https://doi.org/10.1145/2939672.2939778. 
M. Sundararajan, A. TalyAnkur and T. Yan, “Axiomatic Attribution for Deep Networks,” p. DOI: 10.48550/arXiv.1703.01365, 2017.
W. Samek, T. Wiegand, Klaus-Robert and M.-R. Müller, “Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models,” 2017, DOI: 10.48550/arXiv.1708.0829.
Vaswani, N. Shazeer, N. Parmar and I. Polosukhin, “Attention Is All You Need,” p. DOI: 10.48550/arXiv.1706.03762, 2017.
G. Riccardo, A. Monreale, F. Turini and F. Giannotti, “A Survey of Methods for Explaining Black Box Models,” ACM Computing Surveys , p. DOI: 10.1145/3236009, 2018.
T. Miller, “Explanation in Artificial Intelligence: Insights from the Social Sciences,” p. DOI: 10.1016/j.artint.2018.07.007, 2017.
F. Doshi-Velez and B. Kim, “A Roadmap for a Rigorous Science of Interpretability,” p. DOI: 10.48550/arXiv.1702.08608, 2017.
21. F. Charmet, H. Chandra, Tanuwidjaja, S. Ayoubi and Z. Zhang, “Explainable artificial intelligence for cybersecurity: a literature survey,” annals of telecommunications – annales des télécommunications, pp. DOI: 10.1007/s12243-022-00926-7, 2022.

Published online 30.12.2025