SGBANet: Semantic GAN and balanced attention network for arbitrarily oriented scene text recognition

Zhong, Dajian and Lyu, Shujing and Shivakumara, Palaiahnakote and Yin, Bing and Wu, Jiajia and Pal, Umapada and Lu, Yue (2022) SGBANet: Semantic GAN and balanced attention network for arbitrarily oriented scene text recognition. In: Computer Vision - ECCV 2022, PT XXVIII, 23-27 October 2022, Tel Aviv.

Full text not available from this repository.
Official URL: https://link.springer.com/chapter/10.1007/978-3-03...

Abstract

Scene text recognition is a challenging task due to the complex backgrounds and diverse variations of text instances. In this paper, we propose a novel Semantic GAN and Balanced Attention Network (SGBANet) to recognize the texts in scene images. The proposed method first generates the simple semantic feature using Semantic GAN and then recognizes the scene text with the Balanced Attention Module. The Semantic GAN aims to align the semantic feature distribution between the support domain and target domain. Different from the conventional image-to-image translation methods that perform at the image level, the Semantic GAN performs the generation and discrimination on the semantic level with the Semantic Generator Module (SGM) and Semantic Discriminator Module (SDM). For target images (scene text images), the Semantic Generator Module generates simple semantic features that share the same feature distribution with support images (clear text images). The Semantic Discriminator Module is used to distinguish the semantic features between the support domain and target domain. In addition, a Balanced Attention Module is designed to alleviate the problem of attention drift. The Balanced Attention Module first learns a balancing parameter based on the visual glimpse vector and semantic glimpse vector, and then performs the balancing operation for obtaining a balanced glimpse vector. Experiments on six benchmarks, including regular datasets, i.e., IIIT5K, SVT, ICDAR2013, and irregular datasets, i.e., ICDAR2015, SVTP, CUTE80, validate the effectiveness of our proposed method.

Item Type: Conference or Workshop Item (Lecture)
Funders: National Key Research & Development Program of China [Grant No: 2020AAA0107903], National Natural Science Foundation of China (NSFC) [Grant No: 62176091 ; 19ZR1415900]
Additional Information: 17th European Conference on Computer Vision (ECCV), Tel Aviv, ISRAEL, OCT 23-27, 2022.
Uncontrolled Keywords: Semantic GAN; Semantic generator; Semantic discriminator; Balanced attention; Scene text recognition
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Computer Science & Information Technology
Depositing User: Ms. Juhaida Abd Rahim
Date Deposited: 04 Nov 2025 01:40
Last Modified: 04 Nov 2025 01:40
URI: http://eprints.um.edu.my/id/eprint/40495

Actions (login required)

View Item View Item