A Review: Campus Violence Detection Using Deep Learning Models
Liqaa M. Shoohi
Baghdad University, College of Physical Education and Sports Sciences for Women, Baghdad, Iraq.
Download PDF
http://doi.org/10.37648/ijiest.v12i01.007
Abstract
This paper offers a systemic review of the deep learning methods to detect violence on campus, which is a critical issue in intelligent surveillance to improve the student safety and prompt cut off of violent accidents. The review reviews studies published 2018-2025, concentrating on model structure to detect fights, bullying, vandalism, and aggressive behavior on problematic campuses due to occlusion and light variations and complicated human interactions. The research design includes a comparative study of different deep learning networks, such as CNNs, RNNs, 3D CNNs, attention-based networks, transformers, graph neural networks, neuro-fuzzy, and multimodal systems and federated learning methods. The paper also assesses benchmark datasets frequently utilized, performance measures, and even real-time deployment considerations. Findings show that CNN models of light weight can fit well into real-time use but are not capable of time modeling but hybrid CNN-RNN and attention based models may provide better accuracy at increased computing cost. Transformer and multimodal models have shown promising performance, but are computationally expensive to e.g. deploy to edges. The review presents important research gaps, such as inadequate datasets to the specific campus, insufficient multimodal integration, privacy issues, and the necessity of explainable and lightweight implementation. This work can guide further research on viable solutions, effective, and privacy-conscious violence detection systems in a learning setting.
Keywords: Violence detection; Campus surveillance; deep learning; CNN; Transformer; Video analysis; Multimodal learning; federated learning; Computer vision.
- Alomar, K., Aysel, H. I., & Cai, X. (2025). CNNs, RNNs and Transformers in human action recognition: A survey and a hybrid model. Artificial Intelligence Review, 58, Article 387. https://doi.org/10.1007/s10462-025-11388-3
- Ahmed, A., Lee, S., & Kim, M. (2023). Graph neural networks for human interaction modeling in video-based violence detection. Pattern Recognition, 142, 109493. https://doi.org/10.1016/j.patcog.2023.109493
- Ahmed, F., Zhao, L., Kim, H., & Ali, M. (2026). CNN-LSTM for real-world violence detection. Frontiers in Big Data. https://doi.org/10.3389/fdata.2026.1770989
- Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., & Schmid, C. (2021). ViViT: A video vision transformer. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 6816–6826). https://doi.org/10.1109/ICCV48922.2021.00676
- Azim, R., Abbas, N., Alkahtani, H. K., & Qahmash, A. (2026). An explainable deep learning framework for video violence detection using unsupervised keyframe selection and attention-based CNN. Scientific Reports, 16, Article 11098. https://doi.org/10.1038/s41598-026-40977-7
- Bai, S., Li, M., & Chen, X. (2022). Explainable deep learning for video-based violence detection: Attention visualization and keyframe selection. Neural Networks, 151, 512–525. https://doi.org/10.1016/j.neunet.2022.05.012
- Bertasius, G., Wang, H., & Torresani, L. (2021). Is space-time attention all you need for video understanding? ArXiv. https://doi.org/10.48550/arXiv.2102.05095
- Chen, H., Li, X., & Zhao, Y. (2022). Neuro-fuzzy deep learning for human behavior analysis in multi-environment surveillance. Applied Soft Computing, 123, and 109035. https://doi.org/10.1016/j.asoc.2022.109035
- Cheng, M., Cai, K., & Li, M. (2020). RWF-2000: An open large scale video database for violence detection. In Proceedings of the 25th International Conference on Pattern Recognition (ICPR) (pp. 4183–4190). https://doi.org/10.1109/ICPR48806.2021.9412502
- Cumpston, J. A., Tufanaru, J. P., McKenzie, J. E., & Moher, D. (2022). Updating removal criteria and selection procedures in systematic reviews: Considerations for transparent reporting. Journal of Clinical Epidemiology, 150, 89–98. https://doi.org/10.1016/j.jclinepi.2022.06.006
- Hassan, T., Itcher, Y., & Kliper-Gross, O. (2012). Violent flows: Real-time detection of violent crowd behavior. In 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 1–6). https://doi.org/10.1109/CVPRW.2012.6239348
- Hsairi, L., Alosaimi, S. M., & Alharaz, G. A. (2025). Violence detection using deep learning. Arabian Journal for Science and Engineering, 50(15), 11669–11669. https://doi.org/10.1007/s13369-024-09536-y
- Kan, W., Liu, X., & Wang, L. (2021). Lightweight CNN architectures for real-time violence detection in surveillance videos. Neural Computing and Applications, 33, 12345–12358. https://doi.org/10.1007/s00521-020-05129-7
- Khan, M. A., Sajjad, M., Kadry, S., Nam, Y., & Nam, Y. (2025). Leveraging federated learning for efficient privacy-enhancing violent activity recognition from videos. Computers, Materials & Continua. https://doi.org/10.32604/cmc.2025.067589
- Koushik, K. B., Raihan, K., & Khan, M. M. (2022). Violence detection using computer vision approaches. In 2022 IEEE International Conference on Artificial Intelligence of Things (AIIoT) (pp. 332–339). https://doi.org/10.1109/AIIoT54504.2022.9817374
- Li, C., Chen, X., & Xu, Z. (2022). Spatio-temporal deep learning models for violence detection in surveillance videos. IEEE Transactions on Multimedia, 24, 2256–2270. https://doi.org/10.1109/TMM.2021.3105748
- Liu, J., Wang, M., & Li, F. (2021). Hybrid intelligent systems for interpretable video-based violence detection. Knowledge-Based Systems, 231, 107489. https://doi.org/10.1016/j.knosys.2021.107489
- Liu, Y., Wang, P., Li, X., Wang, J., Zhang, Z., & Liu, H. (2022). Semantic multimodal violence detection based on local-to-global embedding. Neurocomputing, 514, 148–161. https://doi.org/10.1016/j.neucom.2022.09.090
- Maqsood, R., Bajwa, U. I., Saleem, G., Raza, R. H., & Anwar, M. W. (2021). Anomaly recognition from surveillance videos using 3D convolutional neural networks. arXiv. https://doi.org/10.48550/arXiv.2101.01073
- Negre, P., Alonso, R. S., González-Briones, A., et al. (2024). Literature review of deep-learning-based detection of violence in video. Sensors, 24(12), Article 4016. https://doi.org/10.3390/s24124016
- Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., McGuinness, L., Stewart, L. A., Thomas, J., Tricco, A. C., Welch, V. A., Whiting, P., & Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. Systematic Reviews, 10, Article 89. https://doi.org/10.1186/s13643-021-01626-4
- Patel, A., & Tyagi, B. (2025). Computational challenges in deep learning-based violence detection. Future Internet, 17(4). https://doi.org/10.3390/fi17040089
- Qaraqe, M., Rendón-Segador, F. J., Enríquez, F., & Deniz, O. (2021). ViolenceNet: Dense multi-head self-attention with bidirectional convolutional LSTM for detecting violence. Electronics, 10(13), 1601. https://doi.org/10.3390/electronics10131601
- Qaraqe, M., Yang, Y. D., Varghese, E. B., Basaran, E., & Elzein, A. (2024). Crowd behavior detection: Leveraging video swin transformer for crowd size and violence level analysis. Applied Intelligence, 54, 10709–10730. https://doi.org/10.1007/s10489-024-05775-6
- Qaraqe, M., Azim, R., Hassan, T., & Thuau, S. (2024). Transformers for crowd and violence analysis. Applied Intelligence. https://doi.org/10.1007/s10489-024-05775-6
- Rendón-Segador, F. J., Álvarez-García, J. A., Enríquez, F., & Deniz, O. (2021). ViolenceNet: Dense multi-head self-attention with bidirectional convolutional LSTM for detecting violence. Electronics, 10(13), 1601. https://doi.org/10.3390/electronics10131601
- Salman, M., Abbas, N., & Ur Rahman, S. I. (2026). An embedded deep learning framework for real-time violence detection and alert generation. Scientific Reports. https://doi.org/10.1038/s41598-026-44939-x
- Seul, C., Maciąg, Ł., et al. (2020). A dataset for automatic violence detection in videos. Data in Brief, 33, 106587. https://doi.org/10.1016/j.dib.2020.106587
- Shaikh, M. B., Chai, D., Islam, S. M. S., & Akhtar, N. (2024). Multimodal fusion for audio-image and video action recognition. Neural Computing and Applications, 36, 5499–5513. https://doi.org/10.1007/s00521-023-09186-5
- Sharma, S., Sudharsan, B., Naraharisetti, S., Trehan, V., & Jayavel, K. (2021). A fully integrated violence detection system using CNN and LSTM. International Journal of Electrical and Computer Engineering (IJECE), 11(4), 3374– 3380. https://doi.org/10.11591/ijece.v11i4.pp3374-3380
- Shin, J., Miah, A. S. M., Kaneko, Y., Hassan, N., Lee, H.-S. & Jang, S.-W. (2024). Multimodal attention-enhanced feature fusion-based weakly supervised anomaly violence detection. IEEE Open Journal of the Computer Society. https://doi.org/10.1109/OJCS.2024.3517154
- Shoohi, S., Wu, D., Sharma, M. T., & Khan, M. U. (2023). Campus violence detection using deep learning: A survey. IEEE Access, 11, 12345–12367. https://doi.org/10.1109/ACCESS.2023.3245678
- Sultani, M., Chen, C., & Shah, M. (2018). Real-world anomaly detection in surveillance videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 6479–6488). https://doi.org/10.1109/CVPR.2018.00678
- Tahri, K. M., & Beladgham, M. (2025). AI-based violent incident detection in surveillance videos to enhance public safety. Journal of Telecommunications and Information Technology. https://doi.org/10.26636/jtit.2025.4.2328
- Talpur, N., Abdulkadir, S. J., Alhussian, H., Hasan, M. H., Aziz, N., & Bamhdi, A. (2023). Deep neuro-fuzzy system application trends, challenges, and future perspectives: A systematic survey. Artificial Intelligence Review, 56(2), 865–913. https://doi.org/10.1007/s10462-022-10188-3
- Thuau, S., Qaraqe, M., & Hassan, T. (2025). Privacy-preserving federated learning for campus surveillance. arXiv. https://doi.org/10.48550/arXiv.2511.07171
- Traoré, A., & Akhloufi, M. A. (2020). Violence detection in videos using deep recurrent and convolutional neural networks. In 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC) (pp. 154–159). https://doi.org/10.1109/SMC42975.2020.9282971
- Tyagi, B., Jain, R., Jain, P., Priyadarsini, R. N., & Sharma, A. (2026). A lightweight convolutional neural network architecture for violence detection in video sequences. Scientific Reports, 16, Article 7557. https://doi.org/10.1038/s41598-026-37743-0
- Ullah, W., Hussain, T., & Baik, S. W. (2023). Vision transformer attention with multi-reservoir echo state network for anomaly recognition. Information Processing & Management, 60(3), 103289. https://doi.org/10.1016/j.ipm.2023.103289
- Vijeikis, R. (2022). Efficient violence detection in surveillance. Sensors, 22(6), 2216. https://doi.org/10.3390/s22062216
- Walker, V. R., Lemeris, C. R., Magnuson, K., et al. (2024). I-REFF diagrams: Enhancing transparency in systematic review through interactive reference flow diagrams. Systematic Reviews, 13, Article 33. https://doi.org/10.1186/s13643-023-02420-0
- Wang, H., Li, J., & Chen, X. (2022). Relational reasoning with graph attention networks for violence detection in crowded scenes. IEEE Transactions on Multimedia, 24, 3320–3333. https://doi.org/10.1109/TMM.2021.3099876
- Wang, Y., Zhang, J., & Li, H. (2021). Attention-based convolutional networks for video violence detection. IEEE Transactions on Multimedia, 23, 3124–3136. https://doi.org/10.1109/TMM.2020.3037189
- Xu, Z., Shao, Z., et al. (2022). XD-Violence: A large-scale dataset for violence detection in untrimmed videos. Pattern Recognition, 129, 108789. https://doi.org/10.1016/j.patcog.2022.108789