A neighborhood undersampling stacked ensemble (NUS-SE) in imbalanced classification

Seng, Zian and Kareem, Sameem Abdul and Varathan, Kasturi Dewi (2021) A neighborhood undersampling stacked ensemble (NUS-SE) in imbalanced classification. Expert Systems with Applications, 168. ISSN 0957-4174, DOI https://doi.org/10.1016/j.eswa.2020.114246.

Full text not available from this repository.


Stacked ensemble, which formulates an ensemble by using a meta-learner to combine (stack) the predictions of multiple base classifiers, suffers from the problem of suboptimal performance on imbalanced classification. To improve the classification performance of stacked ensemble on imbalanced datasets, we proposed a method named Neighborhood Undersampling Stacked Ensemble (NUS-SE) in this paper. In general, the NUS-SE can be broken down into two proposed components, an undersampling based stacked ensemble framework (US-SE) component and an undersampling technique component. In the metadata generation step of stacked ensemble, a cross-validation-like procedure (CV-prediction) is commonly used. Unfortunately, incomplete metadata with missing prediction values is generated when undersampling is performed within a stacked ensemble which utilized CV-prediction as the metadata generation procedure. Therefore, in the proposed US-SE component, we replaced the standard CV-prediction procedure with our proposed method coined as Subset and Out-of-Subset (S-OOS) prediction procedure as the metadata generation method. S-OOS prediction procedure will generate metadata without missing prediction values and thus enabling the integration of undersampling within stacked ensemble. By integrating undersampling within stacked ensemble, multiple undersampled-data-subsets are used in the training of US-SE's base learners. While in the undersampling component, we further proposed a novel undersampling technique - Neighborhood Undersampling (NUS) which selects majority instances based on their local neighborhood information. The performance of the NUS-SE is evaluated against those non-resampling based stacked ensemble as baseline methods. The experiment demonstrates that the proposed NUS-SE, which is an undersampling based stacked ensemble, is capable of achieving a better performance when compared to the non-resampling based stacked ensemble.

Item Type: Article
Uncontrolled Keywords: Imbalanced classification; Class imbalance; Stacked generalization; Stacking; Super learning; Stacked ensemble
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Computer Science & Information Technology > Department of Artificial Intelligence
Faculty of Computer Science & Information Technology > Department of Computer System & Technology
Depositing User: Ms Zaharah Ramly
Date Deposited: 25 May 2022 04:29
Last Modified: 25 May 2022 04:29
URI: http://eprints.um.edu.my/id/eprint/27148

Actions (login required)

View Item View Item