• E-ISSN:

    2454-9584

    P-ISSN

    2454-8111

    Impact Factor 2024

    6.713

    Impact Factor 2023

    6.464

  • E-ISSN:

    2454-9584

    P-ISSN

    2454-8111

    Impact Factor 2024

    6.713

    Impact Factor 2023

    6.464

  • E-ISSN:

    2454-9584

    P-ISSN

    2454-8111

    Impact Factor 2024

    6.713

    Impact Factor 2023

    6.464

INTERNATIONAL JOURNAL OF INVENTIONS IN ENGINEERING & SCIENCE TECHNOLOGY

International Peer Reviewed (Refereed), Open Access Research Journal
(By Aryavart International University, India)

Paper Details

A Review: Campus Violence Detection Using Deep Learning Models

Liqaa M. Shoohi

Baghdad University, College of Physical Education and Sports Sciences for Women, Baghdad, Iraq.

47 - 63 Vol. 12, Issue 1, Jan-Dec, 2026
Receiving Date: 2026-01-06;    Acceptance Date: 2026-02-08;    Publication Date: 2026-03-02
Download PDF

http://doi.org/10.37648/ijiest.v12i01.007

Abstract

This paper offers a systemic review of the deep learning methods to detect violence on campus, which is a critical issue in intelligent surveillance to improve the student safety and prompt cut off of violent accidents. The review reviews studies published 2018-2025, concentrating on model structure to detect fights, bullying, vandalism, and aggressive behavior on problematic campuses due to occlusion and light variations and complicated human interactions. The research design includes a comparative study of different deep learning networks, such as CNNs, RNNs, 3D CNNs, attention-based networks, transformers, graph neural networks, neuro-fuzzy, and multimodal systems and federated learning methods. The paper also assesses benchmark datasets frequently utilized, performance measures, and even real-time deployment considerations. Findings show that CNN models of light weight can fit well into real-time use but are not capable of time modeling but hybrid CNN-RNN and attention based models may provide better accuracy at increased computing cost. Transformer and multimodal models have shown promising performance, but are computationally expensive to e.g. deploy to edges. The review presents important research gaps, such as inadequate datasets to the specific campus, insufficient multimodal integration, privacy issues, and the necessity of explainable and lightweight implementation. This work can guide further research on viable solutions, effective, and privacy-conscious violence detection systems in a learning setting.

Keywords: Violence detection; Campus surveillance; deep learning; CNN; Transformer; Video analysis; Multimodal learning; federated learning; Computer vision.

    References

  1. Alomar, K., Aysel, H. I., & Cai, X. (2025). CNNs, RNNs and Transformers in human action recognition: A survey and a hybrid model. Artificial Intelligence Review, 58, Article 387. https://doi.org/10.1007/s10462-025-11388-3
  2. Ahmed, A., Lee, S., & Kim, M. (2023). Graph neural networks for human interaction modeling in video-based violence detection. Pattern Recognition, 142, 109493. https://doi.org/10.1016/j.patcog.2023.109493
  3. Ahmed, F., Zhao, L., Kim, H., & Ali, M. (2026). CNN-LSTM for real-world violence detection. Frontiers in Big Data. https://doi.org/10.3389/fdata.2026.1770989
  4. Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., & Schmid, C. (2021). ViViT: A video vision transformer. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 6816–6826). https://doi.org/10.1109/ICCV48922.2021.00676
  5. Azim, R., Abbas, N., Alkahtani, H. K., & Qahmash, A. (2026). An explainable deep learning framework for video violence detection using unsupervised keyframe selection and attention-based CNN. Scientific Reports, 16, Article 11098. https://doi.org/10.1038/s41598-026-40977-7
  6. Bai, S., Li, M., & Chen, X. (2022). Explainable deep learning for video-based violence detection: Attention visualization and keyframe selection. Neural Networks, 151, 512–525. https://doi.org/10.1016/j.neunet.2022.05.012
  7. Bertasius, G., Wang, H., & Torresani, L. (2021). Is space-time attention all you need for video understanding? ArXiv. https://doi.org/10.48550/arXiv.2102.05095
  8. Chen, H., Li, X., & Zhao, Y. (2022). Neuro-fuzzy deep learning for human behavior analysis in multi-environment surveillance. Applied Soft Computing, 123, and 109035. https://doi.org/10.1016/j.asoc.2022.109035
  9. Cheng, M., Cai, K., & Li, M. (2020). RWF-2000: An open large scale video database for violence detection. In Proceedings of the 25th International Conference on Pattern Recognition (ICPR) (pp. 4183–4190). https://doi.org/10.1109/ICPR48806.2021.9412502
  10. Cumpston, J. A., Tufanaru, J. P., McKenzie, J. E., & Moher, D. (2022). Updating removal criteria and selection procedures in systematic reviews: Considerations for transparent reporting. Journal of Clinical Epidemiology, 150, 89–98. https://doi.org/10.1016/j.jclinepi.2022.06.006
  11. Hassan, T., Itcher, Y., & Kliper-Gross, O. (2012). Violent flows: Real-time detection of violent crowd behavior. In 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 1–6). https://doi.org/10.1109/CVPRW.2012.6239348
  12. Hsairi, L., Alosaimi, S. M., & Alharaz, G. A. (2025). Violence detection using deep learning. Arabian Journal for Science and Engineering, 50(15), 11669–11669. https://doi.org/10.1007/s13369-024-09536-y
  13. Kan, W., Liu, X., & Wang, L. (2021). Lightweight CNN architectures for real-time violence detection in surveillance videos. Neural Computing and Applications, 33, 12345–12358. https://doi.org/10.1007/s00521-020-05129-7
  14. Khan, M. A., Sajjad, M., Kadry, S., Nam, Y., & Nam, Y. (2025). Leveraging federated learning for efficient privacy-enhancing violent activity recognition from videos. Computers, Materials & Continua. https://doi.org/10.32604/cmc.2025.067589
  15. Koushik, K. B., Raihan, K., & Khan, M. M. (2022). Violence detection using computer vision approaches. In 2022 IEEE International Conference on Artificial Intelligence of Things (AIIoT) (pp. 332–339). https://doi.org/10.1109/AIIoT54504.2022.9817374
  16. Li, C., Chen, X., & Xu, Z. (2022). Spatio-temporal deep learning models for violence detection in surveillance videos. IEEE Transactions on Multimedia, 24, 2256–2270. https://doi.org/10.1109/TMM.2021.3105748
  17. Liu, J., Wang, M., & Li, F. (2021). Hybrid intelligent systems for interpretable video-based violence detection. Knowledge-Based Systems, 231, 107489. https://doi.org/10.1016/j.knosys.2021.107489
  18. Liu, Y., Wang, P., Li, X., Wang, J., Zhang, Z., & Liu, H. (2022). Semantic multimodal violence detection based on local-to-global embedding. Neurocomputing, 514, 148–161. https://doi.org/10.1016/j.neucom.2022.09.090
  19. Maqsood, R., Bajwa, U. I., Saleem, G., Raza, R. H., & Anwar, M. W. (2021). Anomaly recognition from surveillance videos using 3D convolutional neural networks. arXiv. https://doi.org/10.48550/arXiv.2101.01073
  20. Negre, P., Alonso, R. S., González-Briones, A., et al. (2024). Literature review of deep-learning-based detection of violence in video. Sensors, 24(12), Article 4016. https://doi.org/10.3390/s24124016
  21. Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., McGuinness, L., Stewart, L. A., Thomas, J., Tricco, A. C., Welch, V. A., Whiting, P., & Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. Systematic Reviews, 10, Article 89. https://doi.org/10.1186/s13643-021-01626-4
  22. Patel, A., & Tyagi, B. (2025). Computational challenges in deep learning-based violence detection. Future Internet, 17(4). https://doi.org/10.3390/fi17040089
  23. Qaraqe, M., Rendón-Segador, F. J., Enríquez, F., & Deniz, O. (2021). ViolenceNet: Dense multi-head self-attention with bidirectional convolutional LSTM for detecting violence. Electronics, 10(13), 1601. https://doi.org/10.3390/electronics10131601
  24. Qaraqe, M., Yang, Y. D., Varghese, E. B., Basaran, E., & Elzein, A. (2024). Crowd behavior detection: Leveraging video swin transformer for crowd size and violence level analysis. Applied Intelligence, 54, 10709–10730. https://doi.org/10.1007/s10489-024-05775-6
  25. Qaraqe, M., Azim, R., Hassan, T., & Thuau, S. (2024). Transformers for crowd and violence analysis. Applied Intelligence. https://doi.org/10.1007/s10489-024-05775-6
  26. Rendón-Segador, F. J., Álvarez-García, J. A., Enríquez, F., & Deniz, O. (2021). ViolenceNet: Dense multi-head self-attention with bidirectional convolutional LSTM for detecting violence. Electronics, 10(13), 1601. https://doi.org/10.3390/electronics10131601
  27. Salman, M., Abbas, N., & Ur Rahman, S. I. (2026). An embedded deep learning framework for real-time violence detection and alert generation. Scientific Reports. https://doi.org/10.1038/s41598-026-44939-x
  28. Seul, C., Maciąg, Ł., et al. (2020). A dataset for automatic violence detection in videos. Data in Brief, 33, 106587. https://doi.org/10.1016/j.dib.2020.106587
  29. Shaikh, M. B., Chai, D., Islam, S. M. S., & Akhtar, N. (2024). Multimodal fusion for audio-image and video action recognition. Neural Computing and Applications, 36, 5499–5513. https://doi.org/10.1007/s00521-023-09186-5
  30. Sharma, S., Sudharsan, B., Naraharisetti, S., Trehan, V., & Jayavel, K. (2021). A fully integrated violence detection system using CNN and LSTM. International Journal of Electrical and Computer Engineering (IJECE), 11(4), 3374– 3380. https://doi.org/10.11591/ijece.v11i4.pp3374-3380
  31. Shin, J., Miah, A. S. M., Kaneko, Y., Hassan, N., Lee, H.-S. & Jang, S.-W. (2024). Multimodal attention-enhanced feature fusion-based weakly supervised anomaly violence detection. IEEE Open Journal of the Computer Society. https://doi.org/10.1109/OJCS.2024.3517154
  32. Shoohi, S., Wu, D., Sharma, M. T., & Khan, M. U. (2023). Campus violence detection using deep learning: A survey. IEEE Access, 11, 12345–12367. https://doi.org/10.1109/ACCESS.2023.3245678
  33. Sultani, M., Chen, C., & Shah, M. (2018). Real-world anomaly detection in surveillance videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 6479–6488). https://doi.org/10.1109/CVPR.2018.00678
  34. Tahri, K. M., & Beladgham, M. (2025). AI-based violent incident detection in surveillance videos to enhance public safety. Journal of Telecommunications and Information Technology. https://doi.org/10.26636/jtit.2025.4.2328
  35. Talpur, N., Abdulkadir, S. J., Alhussian, H., Hasan, M. H., Aziz, N., & Bamhdi, A. (2023). Deep neuro-fuzzy system application trends, challenges, and future perspectives: A systematic survey. Artificial Intelligence Review, 56(2), 865–913. https://doi.org/10.1007/s10462-022-10188-3
  36. Thuau, S., Qaraqe, M., & Hassan, T. (2025). Privacy-preserving federated learning for campus surveillance. arXiv. https://doi.org/10.48550/arXiv.2511.07171
  37. Traoré, A., & Akhloufi, M. A. (2020). Violence detection in videos using deep recurrent and convolutional neural networks. In 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC) (pp. 154–159). https://doi.org/10.1109/SMC42975.2020.9282971
  38. Tyagi, B., Jain, R., Jain, P., Priyadarsini, R. N., & Sharma, A. (2026). A lightweight convolutional neural network architecture for violence detection in video sequences. Scientific Reports, 16, Article 7557. https://doi.org/10.1038/s41598-026-37743-0
  39. Ullah, W., Hussain, T., & Baik, S. W. (2023). Vision transformer attention with multi-reservoir echo state network for anomaly recognition. Information Processing & Management, 60(3), 103289. https://doi.org/10.1016/j.ipm.2023.103289
  40. Vijeikis, R. (2022). Efficient violence detection in surveillance. Sensors, 22(6), 2216. https://doi.org/10.3390/s22062216
  41. Walker, V. R., Lemeris, C. R., Magnuson, K., et al. (2024). I-REFF diagrams: Enhancing transparency in systematic review through interactive reference flow diagrams. Systematic Reviews, 13, Article 33. https://doi.org/10.1186/s13643-023-02420-0
  42. Wang, H., Li, J., & Chen, X. (2022). Relational reasoning with graph attention networks for violence detection in crowded scenes. IEEE Transactions on Multimedia, 24, 3320–3333. https://doi.org/10.1109/TMM.2021.3099876
  43. Wang, Y., Zhang, J., & Li, H. (2021). Attention-based convolutional networks for video violence detection. IEEE Transactions on Multimedia, 23, 3124–3136. https://doi.org/10.1109/TMM.2020.3037189
  44. Xu, Z., Shao, Z., et al. (2022). XD-Violence: A large-scale dataset for violence detection in untrimmed videos. Pattern Recognition, 129, 108789. https://doi.org/10.1016/j.patcog.2022.108789
Back

Disclaimer: Indexing of published papers is subject to the evaluation and acceptance criteria of the respective indexing agencies. While we strive to maintain high academic and editorial standards, International Journal of Inventions in Engineering & Science Technology does not guarantee the indexing of any published paper. Acceptance and inclusion in indexing databases are determined by the quality, originality, and relevance of the paper, and are at the sole discretion of the indexing bodies.