Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights

Xiong, Jiale and Yang, Jing and Yang, Lei and Awais, Muhammad and Khan, Abdullah Ayub and Alizadehsani, Roohallah and Acharya, U. Rajendra (2024) Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights. Expert Systems with Applications, 238 (E). ISSN 0957-4174, DOI https://doi.org/10.1016/j.eswa.2023.122088.

Full text not available from this repository.
Official URL: https://doi.org/10.1016/j.eswa.2023.122088

Abstract

Plagiarism detection (PD) in natural language processing involves locating similar words in two distinct sources. The paper introduces a new approach to plagiarism detection utilizing bidirectional encoder represen-tations from transformers (BERT)-generated embedding, an enhanced artificial bee colony (ABC) optimization algorithm for pre-training, and a training process based on reinforcement learning (RL). The BERT model can be incorporated into a subsequent task and meticulously refined to function as a model, enabling it to apprehend a variety of linguistic characteristics. Imbalanced classification is one of the fundamental obstacles to PD. To handle this predicament, we present a novel methodology utilizing RL, in which the problem is framed as a series of sequential decisions in which an agent receives a reward at each level for classifying a received instance. To address the disparity between classes, it is determined that the majority class will receive a lower reward than the minority class. We also focus on the training stage, which often utilizes gradient-based learning techniques like backpropagation (BP), leading to certain drawbacks such as sensitivity to initialization. In our proposed model, we utilize a mutual learning-based ABC (ML-ABC) approach that adjusts the food source with the most beneficial results for the candidate by considering a mutual learning factor that incorporates the initial weight. We evaluated the efficacy of our novel approach by contrasting its results with those of population-based techniques using three standard datasets, namely Stanford Natural Language Inference (SNLI), Microsoft Research Paraphrase Corpus (MSRP), and Semantic Evaluation Database (SemEval2014). Our model attained excellent results that outperformed state-of-the-art models. Optimal values for important parameters, including reward function are identified for the model based on experiments on the study dataset. Ablation studies that exclude the proposed ML-ABC and reinforcement learning from the model confirm the independent positive incremental impact of these components on model performance.

Item Type: Article
Funders: UNSPECIFIED
Uncontrolled Keywords: Plagiarism detection; Unbalanced classification; Bidirectional encoder representations from; transformers; Artificial bee colony; Reinforcement learning
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
T Technology > TK Electrical engineering. Electronics Nuclear engineering
Divisions: Faculty of Computer Science & Information Technology
Depositing User: Ms. Juhaida Abd Rahim
Date Deposited: 05 Jul 2024 03:06
Last Modified: 05 Jul 2024 03:06
URI: http://eprints.um.edu.my/id/eprint/44306

Actions (login required)

View Item View Item