Mining Gene Expression Profile with Missing Values: An Integration of Kernel PCA and Robust Singular Values Decomposition

Islam, Md. Saimul and Hoque, Md. Aminul and Islam, Md. Sahidul and Ali, Mohammad and Hossen, Md. Bipul and Binyamin, Md. and Merican, Amir Feisal and Akazawa, Kohei and Kumar, Nishith and Sugimoto, Masahiro (2018) Mining Gene Expression Profile with Missing Values: An Integration of Kernel PCA and Robust Singular Values Decomposition. Current Bioinformatics, 14 (1). pp. 78-89. ISSN 1574-8936, DOI

Full text not available from this repository.
Official URL:


Background: Gene expression profiling and transcriptomics provide valuable information about the role of genes that are differentially expressed between two or more samples. It is always important and challenging to analyse High-throughput DNA microarray data with a number of missing values under various experimental conditions. Objectives: Graphical data visualizations of the expression of all genes in a particular cell provide holistic views of gene expression patterns, which improve our understanding of cellular systems under normal and pathological conditions. However, current visualization methods are sensitive to missing values, which are frequently observed in microarray-based gene expression profiling, potentially affecting the subsequent statistical analyses. Methods: We addressed in this study the problem of missing values with respect to different imputation methods using gene expression biplot (GE biplot), one of the most popular gene visualization techniques. The effects of missing values for mining differentially expressed genes in gene expression data were evaluated using four well-known imputation methods: Robust Singular Value Decomposition (Robust SVD), Column Average (CA), Column Median (CM), and K-nearest Neighbors (KNN). Frobenius norm and absolute distances were used to measure the accuracy of the methods. Results: Three numerical experiments were performed using simulated data (i) and publicly available colon cancer (ii) and leukemia data (iii) to analyze the performance of each method. The results showed that CM and KNN performed better than Robust SVD and CA for identifying the index gene profile in the biplot visualization in both the simulation study and the colon cancer and leukemia microarray datasets. Conclusion: The impact of missing values on the GE biplot was smaller when the data matrix was imputed by KNN than by CM. This study concluded that KNN performed satisfactorily in generating a GE biplot in the presence of missing values in microarray data. © 2019 Bentham Science Publishers.

Item Type: Article
Uncontrolled Keywords: Gene expression profile; simulation; GE biplot; Kernel principal component analysis; singular value decomposition
Subjects: H Social Sciences > HA Statistics
Q Science > QH Natural history
Divisions: Faculty of Science > Institute of Biological Sciences
Depositing User: Ms. Juhaida Abd Rahim
Date Deposited: 22 Mar 2020 11:10
Last Modified: 22 Mar 2020 11:10

Actions (login required)

View Item View Item