Novel multimodal contrast learning framework using zero-shot prediction for abnormal behavior recognition

Liu, Hai Chuan and Khairuddin, Anis Salwa Mohd and Chuah, Joon Huang and Zhao, Xian Min and Wang, Xiao Dan and Fang, Li Ming and Kong, Si Bo (2025) Novel multimodal contrast learning framework using zero-shot prediction for abnormal behavior recognition. Applied Intelligence, 55 (2). p. 110. ISSN 0924-669X, DOI https://doi.org/10.1007/s10489-024-05994-x.

Full text not available from this repository.
Official URL: https://doi.org/10.1007/s10489-024-05994-x

Abstract

Human abnormal behavior detection is important to ensure public safety and prevent unwanted incidents. Currently, recognition systems for human abnormal behavior adopt neural network models and perform standard 1-of-N majority voting procedures. However, recognizing human abnormal behaviors can be challenging due to lengthy and numerous video datasets and the limitations of existing methods that rely on predefined categories and scenarios. This study proposed a novel method named Visual Text Contrastive Learning (VTCL) for identifying abnormal human behavior in campus settings. The proposed model emphasizes semantic information from automatically labeled properties text and videos of abnormal behaviors, moving beyond simple numerical representations. The proposed method integrates the cross and multi-frame methods within the visual branch to improve spatial and temporal performance. In the textual branch, the proposed prompting technique captures the contextual backdrop of abnormal behaviors to enrich supervision with behavioral semantic information. Then, the model learns the visual-text features to enhance the learning process through contrastive learning techniques. In addition, this work also presented a new study to explore zero-shot campus abnormal behavior recognition (CABR). It lays the foundation for unlocking the implementation of highly available and robust CABR for multiple and even new scenarios. The proposed VTCL model demonstrated a Top-1 accuracy of 86.92% and a Top-5 accuracy of 98.14% on the CABR50 dataset, including fifty abnormal behaviors on campus, with competitive computational complexity. Furthermore, the zero-shot performance of the proposed model showed competitive outcomes when evaluated on additional datasets, including CABRZ6 and UCF-101.

Item Type: Article
Funders: Universiti Malaya, Malaysia (ST018-2023)
Uncontrolled Keywords: Abnormal behavior; Multimodal learning; Recognition; Semantic information; Zero-shot
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
T Technology > TK Electrical engineering. Electronics Nuclear engineering
Divisions: Faculty of Engineering > Department of Electrical Engineering
Depositing User: Ms. Juhaida Abd Rahim
Date Deposited: 17 Mar 2025 04:47
Last Modified: 17 Mar 2025 04:47
URI: http://eprints.um.edu.my/id/eprint/47226

Actions (login required)

View Item View Item