IJIEST

A Review: Campus Violence Detection Using Deep Learning Models

Liqaa M. Shoohi

Baghdad University, College of Physical Education and Sports Sciences for Women, Baghdad, Iraq.

47 - 63 Vol. 12, Issue 1, Jan-Dec, 2026

Receiving Date: 2026-01-06; Acceptance Date: 2026-02-08; Publication Date: 2026-03-02

Download PDF

http://doi.org/10.37648/ijiest.v12i01.007

Abstract

This paper offers a systemic review of the deep learning methods to detect violence on campus, which is a critical issue in intelligent surveillance to improve the student safety and prompt cut off of violent accidents. The review reviews studies published 2018-2025, concentrating on model structure to detect fights, bullying, vandalism, and aggressive behavior on problematic campuses due to occlusion and light variations and complicated human interactions. The research design includes a comparative study of different deep learning networks, such as CNNs, RNNs, 3D CNNs, attention-based networks, transformers, graph neural networks, neuro-fuzzy, and multimodal systems and federated learning methods. The paper also assesses benchmark datasets frequently utilized, performance measures, and even real-time deployment considerations. Findings show that CNN models of light weight can fit well into real-time use but are not capable of time modeling but hybrid CNN-RNN and attention based models may provide better accuracy at increased computing cost. Transformer and multimodal models have shown promising performance, but are computationally expensive to e.g. deploy to edges. The review presents important research gaps, such as inadequate datasets to the specific campus, insufficient multimodal integration, privacy issues, and the necessity of explainable and lightweight implementation. This work can guide further research on viable solutions, effective, and privacy-conscious violence detection systems in a learning setting.

Keywords: Violence detection; Campus surveillance; deep learning; CNN; Transformer; Video analysis; Multimodal learning; federated learning; Computer vision.

References

Alomar, K., Aysel, H. I., & Cai, X. (2025). CNNs, RNNs and Transformers in human action recognition: A survey and a hybrid model. Artificial Intelligence Review, 58, Article 387. https://doi.org/10.1007/s10462-025-11388-3
Ahmed, A., Lee, S., & Kim, M. (2023). Graph neural networks for human interaction modeling in video-based violence detection. Pattern Recognition, 142, 109493. https://doi.org/10.1016/j.patcog.2023.109493
Ahmed, F., Zhao, L., Kim, H., & Ali, M. (2026). CNN-LSTM for real-world violence detection. Frontiers in Big Data. https://doi.org/10.3389/fdata.2026.1770989
Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., & Schmid, C. (2021). ViViT: A video vision transformer. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 6816–6826). https://doi.org/10.1109/ICCV48922.2021.00676
Azim, R., Abbas, N., Alkahtani, H. K., & Qahmash, A. (2026). An explainable deep learning framework for video violence detection using unsupervised keyframe selection and attention-based CNN. Scientific Reports, 16, Article 11098. https://doi.org/10.1038/s41598-026-40977-7
Bai, S., Li, M., & Chen, X. (2022). Explainable deep learning for video-based violence detection: Attention visualization and keyframe selection. Neural Networks, 151, 512–525. https://doi.org/10.1016/j.neunet.2022.05.012
Bertasius, G., Wang, H., & Torresani, L. (2021). Is space-time attention all you need for video understanding? ArXiv. https://doi.org/10.48550/arXiv.2102.05095
Chen, H., Li, X., & Zhao, Y. (2022). Neuro-fuzzy deep learning for human behavior analysis in multi-environment surveillance. Applied Soft Computing, 123, and 109035. https://doi.org/10.1016/j.asoc.2022.109035
Cheng, M., Cai, K., & Li, M. (2020). RWF-2000: An open large scale video database for violence detection. In Proceedings of the 25th International Conference on Pattern Recognition (ICPR) (pp. 4183–4190). https://doi.org/10.1109/ICPR48806.2021.9412502
Cumpston, J. A., Tufanaru, J. P., McKenzie, J. E., & Moher, D. (2022). Updating removal criteria and selection procedures in systematic reviews: Considerations for transparent reporting. Journal of Clinical Epidemiology, 150, 89–98. https://doi.org/10.1016/j.jclinepi.2022.06.006
Hassan, T., Itcher, Y., & Kliper-Gross, O. (2012). Violent flows: Real-time detection of violent crowd behavior. In 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 1–6). https://doi.org/10.1109/CVPRW.2012.6239348
Hsairi, L., Alosaimi, S. M., & Alharaz, G. A. (2025). Violence detection using deep learning. Arabian Journal for Science and Engineering, 50(15), 11669–11669. https://doi.org/10.1007/s13369-024-09536-y
Kan, W., Liu, X., & Wang, L. (2021). Lightweight CNN architectures for real-time violence detection in surveillance videos. Neural Computing and Applications, 33, 12345–12358. https://doi.org/10.1007/s00521-020-05129-7
Khan, M. A., Sajjad, M., Kadry, S., Nam, Y., & Nam, Y. (2025). Leveraging federated learning for efficient privacy-enhancing violent activity recognition from videos. Computers, Materials & Continua. https://doi.org/10.32604/cmc.2025.067589
Koushik, K. B., Raihan, K., & Khan, M. M. (2022). Violence detection using computer vision approaches. In 2022 IEEE International Conference on Artificial Intelligence of Things (AIIoT) (pp. 332–339). https://doi.org/10.1109/AIIoT54504.2022.9817374
Li, C., Chen, X., & Xu, Z. (2022). Spatio-temporal deep learning models for violence detection in surveillance videos. IEEE Transactions on Multimedia, 24, 2256–2270. https://doi.org/10.1109/TMM.2021.3105748
Liu, J., Wang, M., & Li, F. (2021). Hybrid intelligent systems for interpretable video-based violence detection. Knowledge-Based Systems, 231, 107489. https://doi.org/10.1016/j.knosys.2021.107489
Liu, Y., Wang, P., Li, X., Wang, J., Zhang, Z., & Liu, H. (2022). Semantic multimodal violence detection based on local-to-global embedding. Neurocomputing, 514, 148–161. https://doi.org/10.1016/j.neucom.2022.09.090
Maqsood, R., Bajwa, U. I., Saleem, G., Raza, R. H., & Anwar, M. W. (2021). Anomaly recognition from surveillance videos using 3D convolutional neural networks. arXiv. https://doi.org/10.48550/arXiv.2101.01073
Negre, P., Alonso, R. S., González-Briones, A., et al. (2024). Literature review of deep-learning-based detection of violence in video. Sensors, 24(12), Article 4016. https://doi.org/10.3390/s24124016
Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., McGuinness, L., Stewart, L. A., Thomas, J., Tricco, A. C., Welch, V. A., Whiting, P., & Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. Systematic Reviews, 10, Article 89. https://doi.org/10.1186/s13643-021-01626-4
Patel, A., & Tyagi, B. (2025). Computational challenges in deep learning-based violence detection. Future Internet, 17(4). https://doi.org/10.3390/fi17040089
Qaraqe, M., Rendón-Segador, F. J., Enríquez, F., & Deniz, O. (2021). ViolenceNet: Dense multi-head self-attention with bidirectional convolutional LSTM for detecting violence. Electronics, 10(13), 1601. https://doi.org/10.3390/electronics10131601
Qaraqe, M., Yang, Y. D., Varghese, E. B., Basaran, E., & Elzein, A. (2024). Crowd behavior detection: Leveraging video swin transformer for crowd size and violence level analysis. Applied Intelligence, 54, 10709–10730. https://doi.org/10.1007/s10489-024-05775-6
Qaraqe, M., Azim, R., Hassan, T., & Thuau, S. (2024). Transformers for crowd and violence analysis. Applied Intelligence. https://doi.org/10.1007/s10489-024-05775-6
Rendón-Segador, F. J., Álvarez-García, J. A., Enríquez, F., & Deniz, O. (2021). ViolenceNet: Dense multi-head self-attention with bidirectional convolutional LSTM for detecting violence. Electronics, 10(13), 1601. https://doi.org/10.3390/electronics10131601
Salman, M., Abbas, N., & Ur Rahman, S. I. (2026). An embedded deep learning framework for real-time violence detection and alert generation. Scientific Reports. https://doi.org/10.1038/s41598-026-44939-x
Seul, C., Maciąg, Ł., et al. (2020). A dataset for automatic violence detection in videos. Data in Brief, 33, 106587. https://doi.org/10.1016/j.dib.2020.106587
Shaikh, M. B., Chai, D., Islam, S. M. S., & Akhtar, N. (2024). Multimodal fusion for audio-image and video action recognition. Neural Computing and Applications, 36, 5499–5513. https://doi.org/10.1007/s00521-023-09186-5
Sharma, S., Sudharsan, B., Naraharisetti, S., Trehan, V., & Jayavel, K. (2021). A fully integrated violence detection system using CNN and LSTM. International Journal of Electrical and Computer Engineering (IJECE), 11(4), 3374– 3380. https://doi.org/10.11591/ijece.v11i4.pp3374-3380
Shin, J., Miah, A. S. M., Kaneko, Y., Hassan, N., Lee, H.-S. & Jang, S.-W. (2024). Multimodal attention-enhanced feature fusion-based weakly supervised anomaly violence detection. IEEE Open Journal of the Computer Society. https://doi.org/10.1109/OJCS.2024.3517154
Shoohi, S., Wu, D., Sharma, M. T., & Khan, M. U. (2023). Campus violence detection using deep learning: A survey. IEEE Access, 11, 12345–12367. https://doi.org/10.1109/ACCESS.2023.3245678
Sultani, M., Chen, C., & Shah, M. (2018). Real-world anomaly detection in surveillance videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 6479–6488). https://doi.org/10.1109/CVPR.2018.00678
Tahri, K. M., & Beladgham, M. (2025). AI-based violent incident detection in surveillance videos to enhance public safety. Journal of Telecommunications and Information Technology. https://doi.org/10.26636/jtit.2025.4.2328
Talpur, N., Abdulkadir, S. J., Alhussian, H., Hasan, M. H., Aziz, N., & Bamhdi, A. (2023). Deep neuro-fuzzy system application trends, challenges, and future perspectives: A systematic survey. Artificial Intelligence Review, 56(2), 865–913. https://doi.org/10.1007/s10462-022-10188-3
Thuau, S., Qaraqe, M., & Hassan, T. (2025). Privacy-preserving federated learning for campus surveillance. arXiv. https://doi.org/10.48550/arXiv.2511.07171
Traoré, A., & Akhloufi, M. A. (2020). Violence detection in videos using deep recurrent and convolutional neural networks. In 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC) (pp. 154–159). https://doi.org/10.1109/SMC42975.2020.9282971
Tyagi, B., Jain, R., Jain, P., Priyadarsini, R. N., & Sharma, A. (2026). A lightweight convolutional neural network architecture for violence detection in video sequences. Scientific Reports, 16, Article 7557. https://doi.org/10.1038/s41598-026-37743-0
Ullah, W., Hussain, T., & Baik, S. W. (2023). Vision transformer attention with multi-reservoir echo state network for anomaly recognition. Information Processing & Management, 60(3), 103289. https://doi.org/10.1016/j.ipm.2023.103289
Vijeikis, R. (2022). Efficient violence detection in surveillance. Sensors, 22(6), 2216. https://doi.org/10.3390/s22062216
Walker, V. R., Lemeris, C. R., Magnuson, K., et al. (2024). I-REFF diagrams: Enhancing transparency in systematic review through interactive reference flow diagrams. Systematic Reviews, 13, Article 33. https://doi.org/10.1186/s13643-023-02420-0
Wang, H., Li, J., & Chen, X. (2022). Relational reasoning with graph attention networks for violence detection in crowded scenes. IEEE Transactions on Multimedia, 24, 3320–3333. https://doi.org/10.1109/TMM.2021.3099876
Wang, Y., Zhang, J., & Li, H. (2021). Attention-based convolutional networks for video violence detection. IEEE Transactions on Multimedia, 23, 3124–3136. https://doi.org/10.1109/TMM.2020.3037189
Xu, Z., Shao, Z., et al. (2022). XD-Violence: A large-scale dataset for violence detection in untrimmed videos. Pattern Recognition, 129, 108789. https://doi.org/10.1016/j.patcog.2022.108789

Back

Contact

2454-9584

2454-8111

6.713

6.464

2454-9584

2454-8111

6.713

6.464

2454-9584

2454-8111

6.713

6.464

INTERNATIONAL JOURNAL OF INVENTIONS IN ENGINEERING & SCIENCE TECHNOLOGY

International Peer Reviewed (Refereed), Open Access Research Journal

(By Aryavart International University, India)

Paper Details

A Review: Campus Violence Detection Using Deep Learning Models

Abstract

References