Xiong, Jiale and Yang, Jing and Yang, Lei and Awais, Muhammad and Khan, Abdullah Ayub and Alizadehsani, Roohallah and Acharya, U. Rajendra (2024) Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights. Expert Systems with Applications, 238 (E). ISSN 0957-4174, DOI https://doi.org/10.1016/j.eswa.2023.122088.
Full text not available from this repository.Abstract
Plagiarism detection (PD) in natural language processing involves locating similar words in two distinct sources. The paper introduces a new approach to plagiarism detection utilizing bidirectional encoder represen-tations from transformers (BERT)-generated embedding, an enhanced artificial bee colony (ABC) optimization algorithm for pre-training, and a training process based on reinforcement learning (RL). The BERT model can be incorporated into a subsequent task and meticulously refined to function as a model, enabling it to apprehend a variety of linguistic characteristics. Imbalanced classification is one of the fundamental obstacles to PD. To handle this predicament, we present a novel methodology utilizing RL, in which the problem is framed as a series of sequential decisions in which an agent receives a reward at each level for classifying a received instance. To address the disparity between classes, it is determined that the majority class will receive a lower reward than the minority class. We also focus on the training stage, which often utilizes gradient-based learning techniques like backpropagation (BP), leading to certain drawbacks such as sensitivity to initialization. In our proposed model, we utilize a mutual learning-based ABC (ML-ABC) approach that adjusts the food source with the most beneficial results for the candidate by considering a mutual learning factor that incorporates the initial weight. We evaluated the efficacy of our novel approach by contrasting its results with those of population-based techniques using three standard datasets, namely Stanford Natural Language Inference (SNLI), Microsoft Research Paraphrase Corpus (MSRP), and Semantic Evaluation Database (SemEval2014). Our model attained excellent results that outperformed state-of-the-art models. Optimal values for important parameters, including reward function are identified for the model based on experiments on the study dataset. Ablation studies that exclude the proposed ML-ABC and reinforcement learning from the model confirm the independent positive incremental impact of these components on model performance.
Item Type: | Article |
---|---|
Funders: | UNSPECIFIED |
Uncontrolled Keywords: | Plagiarism detection; Unbalanced classification; Bidirectional encoder representations from; transformers; Artificial bee colony; Reinforcement learning |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science T Technology > TK Electrical engineering. Electronics Nuclear engineering |
Divisions: | Faculty of Computer Science & Information Technology |
Depositing User: | Ms. Juhaida Abd Rahim |
Date Deposited: | 05 Jul 2024 03:06 |
Last Modified: | 05 Jul 2024 03:06 |
URI: | http://eprints.um.edu.my/id/eprint/44306 |
Actions (login required)
View Item |