aub logo
  • AUB Admission is ongoing for Spring - 2026 (January to April)  | To Apply Click Here
  • AUB International Conference on Higher Education and Sustainable Development (30 - 31 January 2026) | For details - Click Here
  • To verify your document please email us at verification@aub.ac.bd 
  •  *** www.aub.ac.bd is our only website. All other websites in the name of AUB are fake. So everyone is warned not to be deceived. 
aub logo white
AUBIC-2026 কুইজ প্রতিযোগিতা

Contact us

+8801678664413-19

aub_admin January 26, 2022 60 Views

A deep Spatio-temporal network for vision-based sexual harassment detection

Md Shamimul Islam1, Jalal Uddin Md Akbar2, Md Mahedi Hasan3, N HM Arafat4, Sohaib Abdullah1, Saydul Akbar Murad5

1 Dept. of CSE, Asian University of Bangladesh, Dhaka, Bangladesh. 
2 Dept. of CSE, International Islamic University Chittagong, Bangladesh.
3 IICT, BUET, Dhaka, Bangladesh.
4 Dept. of Computer Science and Technology, Henan Polytechnic University, China.
5 Faculty of Computing, Universiti Malaysia Pahang, Malaysia.

Abstract

Smart surveillance systems can play a significant role in detecting sexual harassment in real-time for law enforcement which can reduce the sexual harassment activities. Real-time detecting of sexual harassment from video is a complex computer vision task because of various factors such as clothing or carrying variation, illumination variation, partial occlusion, low resolution, view angle variation, etc. Due to the advancement of convolutional neural networks (CNNs) and Long Short-Term Memory (LSTM), human action recognition tasks have achieved great success in recent years. In this work, to address this problem, we build a video dataset of sexual harassment, namely Sexual Harassment Video (SHV) dataset which consists of harassment and non-harassment videos collected from YouTube. Besides, we build a CNN-LSTM network to detect the sexual harassment in which CNN and RNN are employed for extracting spatial features and temporal features, respectively. State-of-the-art pretrained models are also employed as a spatial feature extractor with an LSTM and three dense layers to classify harassment activities. Moreover, to find the robustness of our proposed model, we have conducted several experiments with our proposed method on two other benchmark datasets, such as Hockey Fight dataset and Movie Violence dataset and achieved state-of-the-art accuracy.
Index Terms: sexual harassment, surveillance systems, deep learning, CNN-LSTM, action recognition.