Partition-Based Clustering Algorithms Applied to Mixed Data for Educational Data Mining: A Survey From 1971 to 2024

Dutt, Ashish and Ismail, Maizatul Akmar and Herawan, Tutut and Hashem, Ibrahim Abaker (2024) Partition-Based Clustering Algorithms Applied to Mixed Data for Educational Data Mining: A Survey From 1971 to 2024. IEEE Access, 12. pp. 172923-172942. ISSN 2169-3536, DOI https://doi.org/10.1109/ACCESS.2024.3496929.

Full text not available from this repository.
Official URL: https://doi.org/10.1109/ACCESS.2024.3496929

Abstract

Educational Data Mining (EDM) is the application of data mining methods in the educational domain. In the EDM field, we see mixed data (i.e., text and number data types). Grouping or clustering such data is challenging because determining the similarity between mixed data is poorly defined. Existing partition clustering algorithms for handling such data are based on two approaches: conversion of data types, where all data variables are converted to a single data type, and a mixed one, where the similarity measures of different data types are merged by either using a weighted sum approach as in Gower's distance or by using mixed dissimilarity function as used in the k-Medoids algorithm to define a singular similarity measure for mixed data. Such a datatype conversion causes information loss, and this aspect is not discussed in the existing research literature. This study systematically reviews the past fifty-three years i.e. from 1971 to 2024 of research works on partition clustering algorithms applied to mixed data in EDM. A review of 104 research articles noted that most partitional clustering algorithms have continuous or categorical variables but not mixed variables. Researchers and practitioners often cite the lack of continuous and categorical variables analysis methods. Therefore, developing machine learning algorithms that can handle mixed data inherently present in the educational domain is increasingly becoming important. In addition to comparative analysis and analysis based on several factors, research gaps are also identified and mentioned in this article, and future insights are outlined.

Item Type: Article
Funders: Monash University (I-M010-SED-000198)
Uncontrolled Keywords: Clustering algorithms; unsupervised learning; data mining; Clustering algorithms; unsupervised learning; data mining
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Computer Science & Information Technology > Department of Information System
Depositing User: Ms. Juhaida Abd Rahim
Date Deposited: 20 Jan 2025 03:46
Last Modified: 20 Jan 2025 03:46
URI: http://eprints.um.edu.my/id/eprint/47635

Actions (login required)

View Item View Item