A new hybrid method for caption and scene text classification in action video images

Nandanwar, Lokesh and Shivakumara, Palaiahnakote and Pal, Umapada and Lu, Tong and Blumenstein, Michael (2021) A new hybrid method for caption and scene text classification in action video images. International Journal of Pattern Recognition and Artificial Intelligence, 35 (12). ISSN 0218-0014, DOI https://doi.org/10.1142/S0218001421600090.

Full text not available from this repository.


Achieving a better recognition rate for text in action video images is challenging due to multiple types of text with unpredictable actions in the background. In this paper, we propose a new method for the classification of caption (which is edited text) and scene text (text that is a part of the video) in video images. This work considers five action classes, namely, Yoga, Concert, Teleshopping, Craft, and Recipes, where it is expected that both types of text play a vital role in understanding the video content. The proposed method introduces a new fusion criterion based on Discrete Cosine Transform (DCT) and Fourier coefficients to obtain the reconstructed images for caption and scene text. The fusion criterion involves computing the variances for coefficients of corresponding pixels of DCT and Fourier images, and the same variances are considered as the respective weights. This step results in Reconstructed image-1. Inspired by the special property of Chebyshev-Harmonic-Fourier-Moments (CHFM) that has the ability to reconstruct a redundancy-free image, we explore CHFM for obtaining the Reconstructed image-2. The reconstructed images along with the input image are passed to a Deep Convolutional Neural Network (DCNN) for classification of caption/scene text. Experimental results on five action classes and a comparative study with the existing methods demonstrate that the proposed method is effective. In addition, the recognition results of the before and after the classification obtained from different methods show that the recognition performance improves significantly after classification, compared to before classification.

Item Type: Article
Funders: Universiti Malaya (GPF096A-2020) (GPF096B-2020) (GPF096C-2020)
Uncontrolled Keywords: Caption text; Scene text; Fusion; DCT coefficients; Chebyshev-Harmonic-Fourier-moments; Caption and scene text classification; Action image recognition
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Computer Science & Information Technology
Depositing User: Ms. Juhaida Abd Rahim
Date Deposited: 24 Feb 2022 04:18
Last Modified: 24 Feb 2022 04:18
URI: http://eprints.um.edu.my/id/eprint/26813

Actions (login required)

View Item View Item