Deep learning approaches for speech emotion recognition: State of the art and research challenges

Jahangir, Rashid and Teh, Ying Wah and Hanif, Faiqa and Mujtaba, Ghulam (2021) Deep learning approaches for speech emotion recognition: State of the art and research challenges. Multimedia Tools and Applications, 80 (16). pp. 23745-23812. ISSN 1380-7501, DOI https://doi.org/10.1007/s11042-020-09874-7.

Full text not available from this repository.

Abstract

Speech emotion recognition (SER) systems identify emotions from the human voice in the areas of smart healthcare, driving a vehicle, call centers, automatic translation systems, and human-machine interaction. In the classical SER process, discriminative acoustic feature extraction is the most important and challenging step because discriminative features influence the classifier performance and decrease the computational time. Nonetheless, current handcrafted acoustic features suffer from limited capability and accuracy in constructing a SER system for real-time implementation. Therefore, to overcome the limitations of handcrafted features, in recent years, variety of deep learning techniques have been proposed and employed for automatic feature extraction in the field of emotion prediction from speech signals. However, to the best of our knowledge, there is no in-depth review study is available that critically appraises and summarizes the existing deep learning techniques with their strengths and weaknesses for SER. Hence, this study aims to present a comprehensive review of deep learning techniques, uniqueness, benefits and their limitations for SER. Moreover, this review study also presents speech processing techniques, performance measures and publicly available emotional speech databases. Furthermore, this review also discusses the significance of the findings of the primary studies. Finally, it also presents open research issues and challenges that need significant research efforts and enhancements in the field of SER systems.

Item Type: Article
Funders: UNSPECIFIED
Uncontrolled Keywords: SER; Deep learning; Survey; Acoustic features; Emotional speech databases
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Computer Science & Information Technology > Department of Computer System & Technology
Depositing User: Ms. Juhaida Abd Rahim
Date Deposited: 03 Mar 2022 04:20
Last Modified: 03 Mar 2022 04:20
URI: http://eprints.um.edu.my/id/eprint/26446

Actions (login required)

View Item View Item