Baby cry recognition based on SLGAN model data generation and deep feature fusion

Zhang, Ke and Ting, Hua-Nong and Choo, Yao-Mun (2024) Baby cry recognition based on SLGAN model data generation and deep feature fusion. Expert Systems with Applications, 242. ISSN 1873-6793, DOI https://doi.org/10.1016/j.eswa.2023.122681.

Full text not available from this repository.

Official URL: https://doi.org/10.1016/j.eswa.2023.122681

Abstract

Deep learning models have been applied in baby cry recognition to enhance the recognition accuracy. However, the current research still suffers from data imbalance problem, which leads to bias in model learning. Sparse Autoencoder Long Short-Term Memory based Generative Adversarial Network (SLGAN) is proposed to solve the data imbalance problem. The proposed SLGAN model generates new baby cry data to ensure the number of samples for every cry class is equal. Speech features are extracted using Mel-spectrograms and Short-Time Fourier Transform (STFT). Two deep learning models, i.e. VGG16 and VGG19 are used to extract the deep features. The deep features are then dimensionally reduced by using Principal Component Analysis (PCA). A sparse autoencoder model is used to fuse the deep features. Finally, the fused features are trained and classified using the Deep Neural Network. The experimental results show that the proposed method outperforms the existing methods.

Item Type:	Article
Funders:	Universiti Malaya [GPF074A-2018]
Uncontrolled Keywords:	Baby cry; Data generation; Generative adversarial networks (GANs); Sparse autoencoder; Feature fusion
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:	Faculty of Engineering Faculty of Engineering > Department of Biomedical Engineering Faculty of Medicine > Paediatrics Department
Depositing User:	Ms. Juhaida Abd Rahim
Date Deposited:	28 Jun 2024 08:15
Last Modified:	28 Jun 2024 08:15
URI:	http://eprints.um.edu.my/id/eprint/44265

Actions (login required)

View Item