Enhancing Public Safety Through Advanced Video Analysis: A Conv-LSTM-SVM Model for Violence Detection in Surveillance Footage

Samuel Muigai Muiruri; Mark Okong’o; David Mwathi

doi:10.37284/eajit.7.1.2117

Samuel Muigai Muiruri Tharaka Nithi County Government
Mark Okong’o, PhD Chuka University
David Mwathi, PhD Chuka University

DOI: https://doi.org/10.37284/eajit.7.1.2117

Keywords: Violence Detection, Video Analysis, Surveillance Systems, Deep Learning, Computer Vision, Artificial Intelligence, Machine Learning

Share Article:

Abstract

This study pioneers a new method for detecting violence in surveillance videos, addressing a major challenge in public safety and video analysis. The study presents a hybrid model that uses Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM) networks, and Support Vector Machines (SVM) to detect violent incidents in video data. The Convolutional Long-Short-Term Memory and Support Vector Machines (Conv-LSTM-SVM) model combines CNN spatial feature extraction, LSTM temporal dependency modelling, and SVM classification power. A pre-trained DenseNet121 model extracts spatial information efficiently via transfer learning from large datasets in the proposed architecture. An LSTM layer captures temporal dynamics needed to understand video sequence activity development, while an SVM with a Radial Basis Function (RBF) kernel creates complex decision boundaries in the feature space with many dimensions for the final categorization. The model was developed, trained and tested using the Keras library running on TensorFlow, using an experimental research design. The model is tested using two well-known datasets: the UCF Crime dataset, which contains 1900 surveillance clips of 13 classes of violent situations, and the RWF-2000 dataset, which analyses real-world fighting. The proposed model is the best in its class, outperforming CNN, LSTM, and Conv-LSTM models with 97.3% accuracy on the UCF Crime dataset. Cross-dataset validation yielded 92.5% accuracy on the RWF-2000 dataset without changes, demonstrating robust generalization. The study also considers how public safety could be improved by processing several video streams in real time and reducing false alarms. An examination of the ethical challenges and restrictions of automated surveillance systems, such as privacy, biases, and human supervision was also done. This research uses advanced video analysis to improve public safety by creating more efficient and adaptive surveillance systems.

Downloads

Download data is not yet available.

References

Accattoli, S., Sernani, P., Falcionelli, N., Mekuria, D. N., & Dragoni, A. F. (2020). Violence Detection in Videos by Combining 3D Convolutional Neural Networks and Support Vector Machines. Applied Artificial Intelligence, 34(4), 329– 344. https://doi.org/10.1080/08839514.2020.1723876

Gao, Y., Liu, H., Sun, X., Wang, C., & Liu, Y. (2016). Violence detection using Oriented VIolent Flows. Image and Vision Computing, 48– 49, 37– 41. https://doi.org/10.1016/j.imavis.2016.01.006

Khan, S. U., Haq, I. U., Rho, S., Baik, S. W., & Lee, M. Y. (2019a). Cover the Violence: A Novel Deep-Learning-Based Approach Towards Violence-Detection in Movies. Applied Sciences, 9(22), 4963. https://doi.org/10.3390/app9224963

Khan, S. U., Haq, I. U., Rho, S., Baik, S. W., & Lee, M. Y. (2019b). Cover the Violence: A Novel Deep-Learning-Based Approach Towards Violence-Detection in Movies. Applied Sciences, 9(22), 4963. https://doi.org/10.3390/app9224963

Li, T., Chang, H., Wang, M., Ni, B., Hong, R., & Yan, S. (2015). Crowded Scene Analysis: A Survey. IEEE Transactions on Circuits and Systems for Video Technology, 25(3), 367–386. https://doi.org/10.1109/tcsvt.2014.2358029

Mabrouk, A. B., & Zagrouba, E. (2017). Spatio-temporal feature using optical flow-based distribution for violence detection. Pattern Recognition Letters, 92, 62–67. https://doi.org/10.1016/j.patrec.2017.04.015

Naphade, & Huang, T. (2002). Extracting semantics from audio-visual content: the final frontier in multimedia retrieval. IEEE Transactions on Neural Networks, 13(4), 793– 810. https://doi.org/10.1109/tnn.2002.1021881

Pattanaik, R. K., Mishra, S., Siddique, M., Gopikrishna, T., & Satapathy, S. (2022). Breast Cancer Classification from Mammogram Images Using Extreme Learning Machine-Based DenseNet121 Model. Journal of Sensors, 2022, 1–12. https://doi.org/10.1155/2022/2731364

Pawar, K., & Attar, V. (2018). Deep learning approaches for video-based anomalous activity detection. World Wide Web, 22(2), 571–601. https://doi.org/10.1007/s11280-018-0582-1

Sak, H., Senior, A., & Beaufays, F. (2014). Long short-term memory recurrent neural network architectures for large scale acoustic modeling. https://doi.org/10.21437/interspeech.2014-80

Ullah, F. U. M., Ullah, A., Muhammad, K., Haq, I. U., & Baik, S. W. (2019). Violence Detection Using Spatiotemporal Features with 3D Convolutional Neural Network. Sensors, 19(11), 2472. https://doi.org/10.3390/s19112472

Welsh, B. C., & Farrington, D. P. (2009). Public Area CCTV and Crime Prevention: An Updated Systematic Review and Meta‐Analysis. Justice Quarterly, 26(4), 716–745. https://doi.org/10.1080/07418820802506206

Zhang, C., Benz, P., Argaw, D. M., Lee, S., Kim, J., Rameau, F., Bazin, J. C., & Kweon, I. S. (2021). ResNet or DenseNet? Introducing Dense Shortcuts to ResNet. https://doi.org/10.1109/wacv48630.2021.00359

Zhang, K., Guo, Y., Wang, X., Yuan, J., & Ding, Q. (2019). Multiple Feature Reweight DenseNet for Image Classification. IEEE Access, 7, 9872– 9880. https://doi.org/10.1109/access.2018.2890127

Zhang, T., Jia, W., He, X., & Yang, J. (2017). Discriminative Dictionary Learning With Motion Weber Local Descriptor for Violence Detection. IEEE Transactions on Circuits and Systems for Video Technology, 27(3), 696–709. https://doi.org/10.1109/tcsvt.2016.2589858

Enhancing Public Safety Through Advanced Video Analysis: A Conv-LSTM-SVM Model for Violence Detection in Surveillance Footage

Abstract

Downloads

References

Partners and Indexes